[jira] Commented: (LUCENENET-383) System.IO.IOException: read past EOF while deleting the file from upload folder of filemanager.
[ https://issues.apache.org/jira/browse/LUCENENET-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967101#action_12967101 ] chaitanya commented on LUCENENET-383: - The error is throwing from Lucene.Net.Store.BufferedIndexInput.Refill() method. please check below for the code if(this.bufferlength=0) { throw new IOexception(read past EOF); } But we don't know when this bufferlength becoming Zero. Neal mentioned that .It is possible that the document id that they are attempting to delete from the Lucene index does not exist.May be the above error comes in this case. But the file exists in upload folder.And what we found is when deleting the file..we are getting above eroor,but anyhow the file is deleted..only annoying thing here is if the file deletion happens successfully also..we are getting this error. What i felt is once after deleting the file...lucene again searching for the ID for second time..that time id is not available..hence the error coming. if this is the situation , why Lucene is searching for this Id..two times..for only one request? System.IO.IOException: read past EOF while deleting the file from upload folder of filemanager. --- Key: LUCENENET-383 URL: https://issues.apache.org/jira/browse/LUCENENET-383 Project: Lucene.Net Issue Type: Bug Environment: production Reporter: chaitanya We are getting System.IO.IOException: read past EOF when deleting the file from upload folder of filemanager.It used to work fine earlier.But from fast few days we are getting this error. We are using episerver content management system and episerver inturn uses Lucene for indexing. Please find the following stack trace of the error.Help me inorder to overcome this error.Thanks in advance [IOException: read past EOF] Lucene.Net.Store.BufferedIndexInput.Refill() +233 Lucene.Net.Store.BufferedIndexInput.ReadByte() +21 Lucene.Net.Store.IndexInput.ReadInt() +13 Lucene.Net.Index.SegmentInfos.Read(Directory directory) +60 Lucene.Net.Index.AnonymousClassWith.DoBody() +45 Lucene.Net.Store.With.Run() +67 Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory) +110 Lucene.Net.Index.IndexReader.Open(String path) +65 EPiServer.Web.Hosting.Versioning.Store.FileOperations.DeleteItemIdFromIndex(String filePath, Object fileId) +78 EPiServer.Web.Hosting.Versioning.Store.FileOperations.DeleteFile(Object dirId, Object fileId) +118 EPiServer.Web.Hosting.Versioning.VersioningFileHandler.Delete() +28 EPiServer.Web.Hosting.VersioningFile.Delete() +118 EPiServer.UI.Hosting.UploadFile.ConfirmReplaceButton_Click(Object sender, EventArgs e) +578 EPiServer.UI.WebControls.ToolButton.OnClick(EventArgs e) +107 EPiServer.UI.WebControls.ToolButton.RaisePostBackEvent(String eventArgument) +135 System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, String eventArgument) +13 System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData) +36 System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +1565 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release PyLucene 2.9.4-1 and 3.0.3-1
+1 to both. I installed both on Linux (Fedora 13) and ran my test python script that indexes first 100K line docs from wikipedia and runs a few searches. No problems! Mike On Sun, Dec 5, 2010 at 1:50 AM, Andi Vajda va...@apache.org wrote: With the recent releases of Lucene Java 2.9.4 and 3.0.3, the PyLucene 2.9.4-1 and 3.0.3-1 releases closely tracking them are ready. Release candidates are available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_2_9/CHANGES http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_0/CHANGES All versions of PyLucene are built with the same version of JCC, currently version 2.7, included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/CHANGES.txt http://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0/CHANGES.txt Please vote to release these artifacts as PyLucene 2.9.4-1 and 3.0.3-1. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: [VOTE] Release PyLucene 2.9.4-1 and 3.0.3-1
On Sun, Dec 5, 2010 at 1:50 AM, Andi Vajda va...@apache.org wrote: With the recent releases of Lucene Java 2.9.4 and 3.0.3, the PyLucene 2.9.4-1 and 3.0.3-1 releases closely tracking them are ready. Release candidates are available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_2_9/CHANGES http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_0/CHANGES All versions of PyLucene are built with the same version of JCC, currently version 2.7, included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/CHANGES.txt http://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0/CHANGES.txt Please vote to release these artifacts as PyLucene 2.9.4-1 and 3.0.3-1. +1, everything looks in order, building pylucene and running 'make test' seemed fine on both versions.
Exception in migrating from 2.9.x to 3.0.2 on Android
The current code that works on Android with 2.9.1, but fails with 3.0.2: Directory dir = FSDirectory.open(file); ... do something with directory ... The error we're seeing is: 12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: java.lang.management.ManagementFactory 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:87) 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:142) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.store.Directory.makeLock(Directory.java:106) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058) Turns out Android does not have java.lang.management.ManagementFactory. There are several work arounds in client code, but not sure what is best. The bigger question is whether and how Lucene should be modified to accommodate? Ultimately FSDirectory.open does the following: if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Thanks, DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
On 5 December 2010 00:16, DM Smith dm-sm...@woh.rr.com wrote: Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Well Lucene is only relying on standard JVM API. The fact that Androïd is using a non-standard JVM is IMHO outside the the scope of Lucene. -- Gérard Dupont Information Processing Control and Cognition (IPCC) CASSIDIAN - an EADS company Document Learning team - LITIS Laboratory
RE: Exception in migrating from 2.9.x to 3.0.2 on Android
Hi DM, In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does not need the process ID anymore, so java.lang.management package is no longer used. In general, Lucene Java is compatible to the Java 5 SE specification. Android uses Harmony and therefore we cannot guarantee compatibility as Harmony is not TCK tested (but we do with latest versions, soon there will also be tests on Hudson with Harmony). But only latest versions of Harmony are really compatible with Lucene, previous versions fail lots of tests (ask Robert), and Android phones use very antique versions of Harmony - it is not even sure, that the Java5 Memory Model is correctly implemented in Dalvik! About 3.0.2: Of course this version even works with latest Harmony, so Harmony has java.lang.management package (which is java.lang!!!), so the bug is in Android, simply by excluding a SE package. So you should open bug report at Google and then hope that they fix it and all the phone manufacturers like Motor-Roller will update their Android versions. For your problem: The easy workaround is using Lucene 3.0.3 or simply use another LockFactory (Andoid is single user so even NoLockFactory would be fine in most cases). This are the same limitations like with the NFS filesystem. Just use FSDir.open(dir, lockFactory). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: DM Smith [mailto:dm-sm...@woh.rr.com] Sent: Sunday, December 05, 2010 12:16 AM To: dev@lucene.apache.org Subject: Exception in migrating from 2.9.x to 3.0.2 on Android The current code that works on Android with 2.9.1, but fails with 3.0.2: Directory dir = FSDirectory.open(file); ... do something with directory ... The error we're seeing is: 12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: java.lang.management.ManagementFactory 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock Factory.java:87) 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor y.java:142) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.store.Directory.makeLock(Directory.java:106) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058) Turns out Android does not have java.lang.management.ManagementFactory. There are several work arounds in client code, but not sure what is best. The bigger question is whether and how Lucene should be modified to accommodate? Ultimately FSDirectory.open does the following: if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Thanks, DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2218 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2218/ 1 tests failed. REGRESSION: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:466) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 8716 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2798) Randomize indexed collation key testing
[ https://issues.apache.org/jira/browse/LUCENE-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966933#action_12966933 ] Robert Muir commented on LUCENE-2798: - Steven, before working too hard on the jdk collation tests, i just had this idea: Are we sure we shouldn't deprecate the jdk collation functionality (remove in trunk) and only offer ICU? I was just thinking that the JDK Collator integration is basically a RAM trap due to its aweful keysize, etc: http://site.icu-project.org/charts/collation-icu4j-sun Randomize indexed collation key testing --- Key: LUCENE-2798 URL: https://issues.apache.org/jira/browse/LUCENE-2798 Project: Lucene - Java Issue Type: Test Components: Analysis Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Robert Muir noted on #lucene IRC channel today that Lucene's indexed collation key testing is currently fragile (for example, they had to be revisited when Robert upgraded the ICU dependency in LUCENE-2797 because of Unicode 6.0 collation changes) and coverage is trivial (only 5 locales tested, and no collator options are exercised). This affects both the JDK implementation in {{modules/analysis/common/}} and the ICU implementation under {{modules/icu/}}. The key thing to test is that the order of the indexed terms is the same as that provided by the Collator itself. Instead of the current set of static tests, this could be achieved via indexing randomly generated terms' collation keys (and collator options) and then comparing the index terms' order to the order provided by the Collator over the original terms. Since different terms may produce the same collation key, however, the order of indexed terms is inherently unstable. When performing runtime collation, the Collator addresses the sort stability issue by adding a secondary sort over the normalized original terms. In order to directly compare Collator's sort with Lucene's collation key sort, a secondary sort will need to be applied to Lucene's indexed terms as well. Robert has suggested indexing the original terms in addition to their collation keys, then using a Sort over the original terms as the secondary sort. Another complication: Lucene 3.X uses Java's UTF-16 term comparison, and trunk uses UTF-8 order, so the implemented secondary sort will need to respect that. From #lucene: {quote} rmuir__: so i think we have to on 3.x, sort the 'expected list' with Collator.compare, if thats equal, then as a tiebreak use String.compareTo rmuir__: and in the index sort on the collated field, followed by the original term rmuir__: in 4.x we do the same thing, but dont use String.compareTo as the tiebreak for the expected list rmuir__: instead compare codepoints (iterating character.codepointAt, or comparing .getBytes(UTF-8)) {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2763) Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966943#action_12966943 ] Robert Muir commented on LUCENE-2763: - +1, looks good to me. Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer --- Key: LUCENE-2763 URL: https://issues.apache.org/jira/browse/LUCENE-2763 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2763.patch Currently, in addition to implementing the UAX#29 word boundary rules, StandardTokenizer recognizes email adresses and URLs, but doesn't provide a way to turn this behavior off and/or provide overlapping tokens with the components (username from email address, hostname from URL, etc.). UAX29Tokenizer should become StandardTokenizer, and current StandardTokenizer should be renamed to something like UAX29TokenizerPlusPlus (or something like that). For rationale, see [the discussion at the reopened LUCENE-2167|https://issues.apache.org/jira/browse/LUCENE-2167?focusedCommentId=12929325page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12929325]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-Solr-tests-only-trunk - Build # 2221 - Failure
Well, darn upgrading jetty didn't seem to help this. -Yonik http://www.lucidimagination.com On Sun, Dec 5, 2010 at 7:05 AM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2221/ 1 tests failed. REGRESSION: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:466) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 8716 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-Solr-tests-only-trunk - Build # 2221 - Failure
On Sun, Dec 5, 2010 at 9:00 AM, Yonik Seeley yo...@lucidimagination.com wrote: Well, darn upgrading jetty didn't seem to help this. I was getting really hopeful for a while! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-Solr-tests-only-trunk - Build # 2211 - Failure
On Sun, Dec 5, 2010 at 1:46 AM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2211/ 1 tests failed. REGRESSION: org.apache.solr.update.AutoCommitTest.testMaxTime There's still a timing issue in this test I think. I modified it a while ago to make it better but Hoss mentioned on the mailing list some way we could change it to not be fragile... - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-1979: - Assignee: Grant Ingersoll Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2244) Add Language Identification support
[ https://issues.apache.org/jira/browse/SOLR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-2244. --- Resolution: Won't Fix Actually, I'm going to switch back to SOLR-1979, as it is a superset of this patch. I should have a patch up shortly. Add Language Identification support --- Key: SOLR-2244 URL: https://issues.apache.org/jira/browse/SOLR-2244 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Attachments: solr2244.patch For starters, Tika has language identification capabilities that we can likely leverage, but moreover, make it easier for people to plug in language identification into the indexing process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966955#action_12966955 ] Grant Ingersoll commented on SOLR-1979: --- See http://wiki.apache.org/solr/LanguageDetection for the start of documentation. bq. isReasonablyCertain() always returns false See TIKA-568. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966963#action_12966963 ] Robert Muir commented on LUCENE-2793: - There is another problem we should solve here, and that is the buffersize problem. This is totally broken at the moment for custom directories, here's an example. I wanted to set the buffersize by default to 4096 (since i measured this is like a 20% improvement for my directory impl). looking at the apis you would think that you simply override the openInput that takes no buffer size like this: {noformat} @Override public IndexInput openInput(String name) throws IOException { return openInput(name, 4096); } {noformat} unfortunately this doesnt work at all! instead you have to do something like this for it to actually work: {noformat} @Override public IndexInput openInput(String name, int bufferSize) throws IOException { ensureOpen(); return new IndexInput(name, Math.max(bufferSize, 4096)); } {noformat} The problem is, throughout lucene's APIs, the directory's default is never used, instead the static BufferedIndexInput.BUFFER_SIZE is used everywhere... eg SegmentReader.get: {noformat} public static SegmentReader get(boolean readOnly, SegmentInfo si, int termInfosIndexDivisor) throws CorruptIndexException, IOException { return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, termInfosIndexDivisor); } {noformat} So I think lucene's apis should never specify buffersize, we should remove it completely from the codecs api, and it should be *replaced* with IOContext. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Michael McCandless Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966964#action_12966964 ] Jan Høydahl commented on SOLR-1979: --- Simply allowing to set the threshold for isReasonablyCertain() is probably not enough to get a robust detection. This is because the distance measure is very sensitive to the length of the profiles in use. Thus, it is a bit dangerous to expose getDistance() as in TIKA-568, cause that distance measure is kind of an internal value, not very normalized and is bound to change in future versions of TIKA. See TIKA-369 and TIKA-496. I think the right way to go is solving these two issues first. By fixing so that getDisance() is not biased towards profile length, we can make a new isReasonablyCertain() implementation taking into account the relative distance between first and second candidate languages... Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966970#action_12966970 ] Jan Høydahl commented on SOLR-1979: --- The idField input parameter is just used for decent logging if detection fails. It would be more elegant to get the id field name automatically through SolrCore... Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2158) TestDistributedSearch.testDistribSearch fails often
[ https://issues.apache.org/jira/browse/SOLR-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966971#action_12966971 ] Yonik Seeley commented on SOLR-2158: OK, so we upgraded jetty... but the failed to respond exception still happens. Just to try and narrow things down, I put a long sleep inside solr request handling and then tried a distributed search... it worked fine. So it doesn't appear to be something getting hung up in Solr. - a jetty bug - an embedded jetty bug - a HttpClient bug - a bug in the way solr uses HttpClient Another data point: with my load testing tool, I can run millions of requests against Jetty/Solr (and I just did again). It doesn't use HttpClient though, and it uses GET instead of POST. Some things to try: - Modify the load tool to use POST and verify things still work - Put a long pause in TestDistributedSearch after the solr servers are brought up, and then try load testing against those servers w/ an external tool. - if this fails, we know it's an issue with how we embed Jetty - Make a load testing tool that uses SolrJ exactly the way that distributed search uses it, and try it on a normal Solr server - if this fails, ti could be an HttpClient bug, or a jetty bug tickled by HttpClient specifically - if this fails, make a small self-contained load tool that uses only HttpClient to remove the possibility of SolrJ bugs TestDistributedSearch.testDistribSearch fails often --- Key: SOLR-2158 URL: https://issues.apache.org/jira/browse/SOLR-2158 Project: Solr Issue Type: Bug Components: Build Affects Versions: 3.1, 4.0 Environment: Hudson Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: TEST-org.apache.solr.TestDistributedSearch.txt TestDistributedSearch.testDistribSearch fails often in hudson, with some threads throwing uncaught exceptions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966972#action_12966972 ] Robert Muir commented on SOLR-1979: --- bq. cause that distance measure is kind of an internal value, not very normalized and is bound to change in future versions of TIKA. bq. we can make a new isReasonablyCertain() implementation taking into account the relative distance between first and second candidate languages... I don't follow the logic: if its not very normalized then it seems like this approach doesnt tell you anything... language 1 could be uncertain, and language 2 is just completely uncertain, but that tells you nothing: isn't it like trying to determine if a good lucene search result score is certainly a hit and not really the right way to go? For example: consider the case where the language isn't supported at all by Tika (i dont see a list of supported languages anywhere by the way!). It would be good for us to know that the detection is uncertain at all... how relatively uncertain it is with regards to the next language, is not very important. I think its also important we be able to get this uncertainty or whatever different agnostic of the implementation. For example, we should be able to somehow think of chaining detectors... Its really important to cheat and not use heuristics for languages that don't need them. For example, disregarding some strange theoretical/historical cases, you can simply look at the unicode properties in the document to determine that its in the Greek language, as its basically the only modern language using the greek alphabet Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2158) TestDistributedSearch.testDistribSearch fails often
[ https://issues.apache.org/jira/browse/SOLR-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966718#action_12966718 ] Yonik Seeley edited comment on SOLR-2158 at 12/5/10 10:38 AM: -- Moving Robert's stack trace from the description to the comments. {code} [junit] Testsuite: org.apache.solr.TestDistributedSearch [junit] Testcase: testDistribSearch(org.apache.solr.TestDistributedSearch): FAILED [junit] Some threads threw uncaught exceptions! [junit] junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768) [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:416) [junit] at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:76) [junit] at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) [junit] [junit] [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 382.297 sec [junit] [junit] - Standard Error - [junit] 2010. 10. 15 ?? 2:08:04 org.apache.solr.common.SolrException log [junit] ??: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request [junit] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:318) [junit] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) [junit] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) [junit] at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) [junit] at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) [junit] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) [junit] at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) [junit] at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) [junit] at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) [junit] at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) [junit] at org.mortbay.jetty.Server.handle(Server.java:326) [junit] at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) [junit] at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) [junit] at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) [junit] at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) [junit] at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) [junit] at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) [junit] at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) [junit] Caused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request [junit] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:297) [junit] at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:513) [junit] at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:478) [junit] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [junit] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [junit] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [junit] at java.lang.Thread.run(Thread.java:636) [junit] Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Operation timed out [junit] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483) [junit] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) [junit] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:274) [junit] ... 10 more [junit] Caused by: java.net.ConnectException: Operation timed out [junit] at
[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1979: -- Attachment: SOLR-1979.patch I took Jan's and Tommaso's patches and reworked them a bit. It seems to me that there isn't much point in merely identifying the language if you aren't going to do something about it. So, this patch builds on what Jan and Tommaso did and then will remap the input fields to new per language fields (note, we could make this optional). I also tried to standardize the input parameters a bit. I dropped the outputField setting and a number of other settings and I made the language detection to be per input field. The basic gist of it is that if you input two fields: name, subject, it will detect the language of each field and then attempt to map them to a new field. The new field is made by concatenating the original field name with _ + the ISO 639 code. For example, if en is the detected language, then the new field for name would be name_en. If that field doesn't exist, it will fall back to the original field (i.e. name). Left to do: # Fix the tests. I don't like how we currently tests UpdateProcessorChains. It should not require writing your own little piece of update mechanism. You should be able to simply setup the appropriate configuration, hook it into an update handler and then hit that update handler. # Need to check the license headers, builds, etc. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966978#action_12966978 ] Robert Muir commented on SOLR-1979: --- We really need to not be using ISO 639-1 here. For example, Its not expressive enough, not differentiating between Simplified and Traditional chinese, yet SmartChineseAnalyzer only works on Simplified. I would like to see RFC 3066 instead Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
Hi Mark, RE: the credit system. JIRA provides a contribution report here, like this one that I generated for Lucene 3.1: http://s.apache.org/BpL Just click on Reports Contribution Report in the upper right of JIRA on the main project summary page. We've been using this in Tika since the beginning to indicate contributions from folks and it's worked well. Cheers, Chris On Dec 4, 2010, at 10:03 PM, Mark Miller wrote: I like this idea myself - it would encourage better JIRA summaries and reduce duplication. It's easy to keep a mix of old and new too - keep the things that Grant mentions in CHANGES.txt (back compat migration, misc info), but you can also just export a text Changes from JIRA at release and add that (along with a link). Certainly nice to have a 'hard' copy. https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12315147styleName=TextprojectId=12310110Create=Create The only thing I don't like is the loss of the current credit system - I like that better than the crawl through JIRA method. I think prominent credits are a good encouragement for new contributors. Any comments on that? - Mark On 12/2/10 11:46 AM, Grant Ingersoll wrote: I think we should drop the item by item change list and instead focus on 3 things: 1. Prose describing the new features (see Tika's changes file for instance) and things users should pay special attention to such as when they might need to re-index. 2. Calling out explicit compatibility breaks 3. A Pointer to full list of changes in JIRA. Alternatively, I believe there is a way in JIRA to export/generate a summary of all issues fixed. #1 can be done right before release simply by going through #3 and doing the appropriate wordsmithing. #2 should be tracked as it is found. It's kind of silly that we have all this duplication of effort built in, not too mention having to track it across two branches. We do this over in Mahout and I think it works pretty well and reduces the duplication quite a bit since everything is already in JIRA and JIRA produces nice summaries too. It also encourages people to track things better in JIRA. #1 above also lends itself well as the basis of press releases/blogs/etc. -Grant On Dec 1, 2010, at 11:54 AM, Michael McCandless wrote: So, going forward... When committing an issue that needs a changes entry, where are we supposed to put it? EG if it's a bug fix that we'll backport all the way to 2.9.x... where does it go? If it's a new feature/API that's going to 3.x and trunk... only in 3.x's CHANGES? Mike On Wed, Dec 1, 2010 at 9:22 AM, Uwe Schindleru...@thetaphi.de wrote: Hi all, when merging changes done in 2.9.4/3.0.3 with current 3.x and trunk I found out that 3.x changes differ immense between the trunk changes.txt and the 3.x changes.txt. Some entries are missing in the 3.x branch, but are available in trunk's 3.x part or other entries using new trunk class names are between 3.x changes in trunk. I copied over the 3.x branch CHANGES.txt over trunks 3.x section and attached a patch of this. What should we do? Its messy :( Most parts seem to be merge failures. We should go through all those diff'ed issues and check where they were really fixed (3.x or trunk) and move the entries accordingly. After that in the 3.x branch and in trunk's 3.x section of CHANGES.txt should be identical text! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Grant Ingersoll http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Changes Mess
On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Mark, RE: the credit system. JIRA provides a contribution report here, like this one that I generated for Lucene 3.1: My concern with this is that it leaves out important email contributors. For example if a user reports a bug, we typically include their name in CHANGES.txt The user who reports the bug does the hard work of finding that there is a bug and reporting it to us. Additionally sometimes they do extra stuff, boiling the problem down to a certain piece of code, into a test case, etc, even if they don't know how to fix the bug. Then again, maybe they are a solr user who doesn't even know the java programming language but finds a nasty bug in lucene. In all cases I think if a user finds a bug and we fix it, its important we credit them as we should encourage people to find bugs :) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2266) java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord()
java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord() Key: SOLR-2266 URL: https://issues.apache.org/jira/browse/SOLR-2266 Project: Solr Issue Type: Bug Affects Versions: 1.4.1 Environment: Mac OS 10.6 java version 1.6.0_22 Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode) Reporter: Peter Wolanin I have been testing a switch to long and tdate instead of int and date fields in the schema.xml for our Drupal integration. This indexes fine, but search fails with a 500 error. {code} INFO: [d7] webapp=/solr path=/select params={spellcheck=truefacet=truefacet.mincount=1indent=1spellcheck.q=termjson.nl=mapwt=jsonrows=10version=1.2fl=id,entity_id,entity,bundle,bundle_name,nid,title,comment_count,type,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=termbf=recip(rord(created),4,19,19)^200.0} status=500 QTime=4 Dec 5, 2010 11:52:28 AM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 39 at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692) at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:61) at org.apache.solr.search.function.TopValueSource.getValues(TopValueSource.java:57) at org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61) at org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123) at org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) {code} The exception goes away if I remove the boost function param bf=recip(rord(created),4,19,19)^200.0 Omitting the recip() doesn't help, so just bf=rord(created)^200.0 still causes the exception. In
[jira] Resolved: (LUCENE-1541) Trie range - make trie range indexing more flexible
[ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1541. --- Resolution: Won't Fix I don't think a fix is needed anymore. Trie range - make trie range indexing more flexible --- Key: LUCENE-1541 URL: https://issues.apache.org/jira/browse/LUCENE-1541 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9 Reporter: Ning Li Assignee: Uwe Schindler Priority: Minor Fix For: 4.0 Attachments: LUCENE-1541.patch, LUCENE-1541.patch In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32). We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Path to jquery?
You are quite right. I put a bug into JIRA, basically the layout.vm was referring to a older version of jquery then what was in the Solr.war file! I do think though that having everything all in the /velocity directory would make it easier for someone who is new to Solr to grok how to customize the /browse interface! Most folks do NOT want to be adding/hacking files in the solr.war, they just want to use what is distributed! Eric On Dec 2, 2010, at 4:45 PM, Ryan McKinley wrote: jquery is actually in the .war file, so you read it directly from the server. The file?file=/velocity... request streams content from inside your solr configuration directory On Thu, Dec 2, 2010 at 10:35 AM, Eric Pugh ep...@opensourceconnections.com wrote: Hi all, Looking at Solr 3.x, it seems like that path to jquery fails if you are using multicore. In layout.vm there is: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.2.3.min.js/script However, for other files it is specified via: script type=text/javascript src=#{url_for_solr}/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript/script Thinking that the URL for jquery should be the same as the other jquery.autocomplete.js, and packaged in the /velocity directory as well??? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Changes Mess
On 12/5/2010 at 12:19 PM, Robert Muir wrote: On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Mark, RE: the credit system. JIRA provides a contribution report here, like this one that I generated for Lucene 3.1: My concern with this is that it leaves out important email contributors. I agree, this is a serious problem. My additional problems with JIRA-generated changes: 1. Huge undifferentiated change lists are frightening and nearly useless, regardless of the quality of the descriptions. JIRA's issue types are: Bug, New Feature, Improvement, Test, Wish, Task Even if we used JIRA's issue types to group issues, they are not the same as Lucene's CHANGES.txt issue types: Changes in backwards compatibility policy, Changes in runtime behavior, API Changes, Documentation, Bug fixes, New features, Optimizations, Build, Test Cases, Infrastructure (I left out Requirements, last used in 2006 under release 1.9 RC1, since Build seems to have replaced it.) 2. There are now four separate CHANGES.txt files in the Lucene code base, excluding Solr and its modules (each of which has one of them). This number will only grow as more Lucene contribs become modules. The JIRA project components list is outdated / incomplete / has different granularity than the CHANGES.txt locations, so using it to group JIRA issues would not work because they don't align with Lucene/Solr components. 3. Some of the CHANGES.txt entries draw from multiple JIRA issues. From dev/trunk/lucene/CHANGES.txt: Trunk: 9 out of 56 include multiple JIRA issues 3.X: 7/94 3.0.0: 3/29 2.9.0: 9/153 I'm assuming a JIRA dump can't do this. 4. Some JIRA issues appear under multiple change categories in CHANGES.txt. From dev/trunk/lucene/CHANGES.txt: Trunk: 3 out of 68 multiply categorized 3.X: 9/102 3.0.0: 1/53 2.9.0: 20/166 A JIRA dump would not allow for multiple issue categorization, since JIRA only allows a single issue type to be assigned - I guess they are assumed to be mutually exclusive. Maybe our use of JIRA could be changed to address some of these problems, through addition of new fields and/or modification of existing fields' allowable values? Steve
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967010#action_12967010 ] Grant Ingersoll commented on SOLR-1979: --- bq. I would like to see RFC 3066 instead Yeah, that makes sense, however, I believe Tika returns 639. (Tika doesn't recognize Chinese yet at all). One approach is we could normalize, I suppose. Another is to fix Tika. I'd really like to see Tika support more languages, too. Longer term, I'd like to not do the fieldName_LangCode thing at all and instead let the user supply a string that could have variable substitution if they want, something like fieldName_${langCode}, or it could be ${langCode}_fieldName or it could just be another literal. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967011#action_12967011 ] Grant Ingersoll commented on SOLR-1979: --- Another thought, here, is that, over time, this class becomes a base class and it becomes easy to replace the language detection piece, that way one gets all the infrastructure of this class, but can plugin their own detection. In fact, I'm going to do that right now. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Monitoring the UI's mem usage
This shouldn't normally be something that you need to do with jruby I'd think - but Avram asked about this on the call back when there where ui running out of memory issues. Since we require java 6, this is actually really easy. Java itself comes with jconsole. It should be on your path. You just start it, and it lists running java processes that you can connect to. Choose the one with jruby-complete-1.5.3.jar in the name for the UI. The back end is the one with start.jar in the name. I usually prefer visualvm over jconsole (kind of a supe'd up version of jconsole with a mem/cpu profiler). Its free and simple to use at https://visualvm.dev.java.net/. That makes it very easy to see how the UI and back end are using memory, their garbage collection activity, cpu usage, etc. I often run one on my laptop screen as I test LWE. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Testing UpdateProcessorChain
Anyone have any thoughts on testing UpdateProcessorChain (and Factory). In looking at the Signature (dedup) tests, it seems a little clunky, yet the Solr base test class adoc (and related methods) don't seem to support specifying the Update handler to hit. Thoughts? -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Monitoring the UI's mem usage
Gotto love a wrong email address autocomplete. On 12/5/10 3:26 PM, Mark Miller wrote: This shouldn't normally be something that you need to do with jruby I'd think - but Avram asked about this on the call back when there where ui running out of memory issues. Since we require java 6, this is actually really easy. Java itself comes with jconsole. It should be on your path. You just start it, and it lists running java processes that you can connect to. Choose the one with jruby-complete-1.5.3.jar in the name for the UI. The back end is the one with start.jar in the name. I usually prefer visualvm over jconsole (kind of a supe'd up version of jconsole with a mem/cpu profiler). Its free and simple to use at https://visualvm.dev.java.net/. That makes it very easy to see how the UI and back end are using memory, their garbage collection activity, cpu usage, etc. I often run one on my laptop screen as I test LWE. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Testing UpdateProcessorChain
On Sun, Dec 5, 2010 at 3:28 PM, Grant Ingersoll gsing...@apache.org wrote: Anyone have any thoughts on testing UpdateProcessorChain (and Factory). In looking at the Signature (dedup) tests, it seems a little clunky, yet the Solr base test class adoc (and related methods) don't seem to support specifying the Update handler to hit. You can specify an alternate update processor with any update command. SolrTestCaseJ4 has this: public static String add(XmlDoc doc, String... args) { so... you should be able to do something like add(doc(id,10),update.processor,foo) -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
Thanks Uwe (and others). We'll adapt. Is there any interest here in knowing if there are any other problems regarding Lucene on Android? From what I see, it is the first mobile platform on which Lucene can run. -- DM On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote: Hi DM, In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does not need the process ID anymore, so java.lang.management package is no longer used. In general, Lucene Java is compatible to the Java 5 SE specification. Android uses Harmony and therefore we cannot guarantee compatibility as Harmony is not TCK tested (but we do with latest versions, soon there will also be tests on Hudson with Harmony). But only latest versions of Harmony are really compatible with Lucene, previous versions fail lots of tests (ask Robert), and Android phones use very antique versions of Harmony - it is not even sure, that the Java5 Memory Model is correctly implemented in Dalvik! About 3.0.2: Of course this version even works with latest Harmony, so Harmony has java.lang.management package (which is java.lang!!!), so the bug is in Android, simply by excluding a SE package. So you should open bug report at Google and then hope that they fix it and all the phone manufacturers like Motor-Roller will update their Android versions. For your problem: The easy workaround is using Lucene 3.0.3 or simply use another LockFactory (Andoid is single user so even NoLockFactory would be fine in most cases). This are the same limitations like with the NFS filesystem. Just use FSDir.open(dir, lockFactory). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: DM Smith [mailto:dm-sm...@woh.rr.com] Sent: Sunday, December 05, 2010 12:16 AM To: dev@lucene.apache.org Subject: Exception in migrating from 2.9.x to 3.0.2 on Android The current code that works on Android with 2.9.1, but fails with 3.0.2: Directory dir = FSDirectory.open(file); ... do something with directory ... The error we're seeing is: 12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: java.lang.management.ManagementFactory 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock Factory.java:87) 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor y.java:142) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.store.Directory.makeLock(Directory.java:106) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058) Turns out Android does not have java.lang.management.ManagementFactory. There are several work arounds in client code, but not sure what is best. The bigger question is whether and how Lucene should be modified to accommodate? Ultimately FSDirectory.open does the following: if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Thanks, DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
I have an interest - don't really care if it uses true java or not. I say keep it coming. Where/if it makes sense, why not make lucene work better with it. Perhaps that is not possible or too difficult in every case - but I'd still like to see the cases pop up. Better than those spam wiki update emails. - Mark On 12/5/10 3:36 PM, DM Smith wrote: Thanks Uwe (and others). We'll adapt. Is there any interest here in knowing if there are any other problems regarding Lucene on Android? From what I see, it is the first mobile platform on which Lucene can run. -- DM On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote: Hi DM, In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does not need the process ID anymore, so java.lang.management package is no longer used. In general, Lucene Java is compatible to the Java 5 SE specification. Android uses Harmony and therefore we cannot guarantee compatibility as Harmony is not TCK tested (but we do with latest versions, soon there will also be tests on Hudson with Harmony). But only latest versions of Harmony are really compatible with Lucene, previous versions fail lots of tests (ask Robert), and Android phones use very antique versions of Harmony - it is not even sure, that the Java5 Memory Model is correctly implemented in Dalvik! About 3.0.2: Of course this version even works with latest Harmony, so Harmony has java.lang.management package (which is java.lang!!!), so the bug is in Android, simply by excluding a SE package. So you should open bug report at Google and then hope that they fix it and all the phone manufacturers like Motor-Roller will update their Android versions. For your problem: The easy workaround is using Lucene 3.0.3 or simply use another LockFactory (Andoid is single user so even NoLockFactory would be fine in most cases). This are the same limitations like with the NFS filesystem. Just use FSDir.open(dir, lockFactory). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: DM Smith [mailto:dm-sm...@woh.rr.com] Sent: Sunday, December 05, 2010 12:16 AM To: dev@lucene.apache.org Subject: Exception in migrating from 2.9.x to 3.0.2 on Android The current code that works on Android with 2.9.1, but fails with 3.0.2: Directory dir = FSDirectory.open(file); ... do something with directory ... The error we're seeing is: 12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: java.lang.management.ManagementFactory 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock Factory.java:87) 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor y.java:142) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.store.Directory.makeLock(Directory.java:106) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058) Turns out Android does not have java.lang.management.ManagementFactory. There are several work arounds in client code, but not sure what is best. The bigger question is whether and how Lucene should be modified to accommodate? Ultimately FSDirectory.open does the following: if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Thanks, DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967016#action_12967016 ] Yonik Seeley commented on SOLR-1979: bq. The new field is made by concatenating the original field name with _ + the ISO 639 code. This could be problematic given a large set of language codes since they could collide with existing dynamic field definitions. Perhaps something with text in the name also? Perhaps fieldName_${langCode}Text Examples: name_enText name_frText It would probably also be nice to be able to map a number of languages to a single field say you have a single analyzer that can handle CJK, then you may want that whole collection of languages mapped to a single _cjk field. And just because you can detect a language doesn't mean you know how to handle it differently... so also have an optional catchall that handles all languages not specifically mapped. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967019#action_12967019 ] Robert Muir commented on SOLR-1979: --- bq. Yeah, that makes sense, however, I believe Tika returns 639. Right, but 639 is just a subset of 3066 etc. So, ignore what tika does. its 639 identifiers are also valid 3066. Our API should at least be 3066, Java7/ICU already support BCP47 locale identifiers etc, so you get the normalization there for free. {quote} It would probably also be nice to be able to map a number of languages to a single field say you have a single analyzer that can handle CJK, then you may want that whole collection of languages mapped to a single _cjk field. And just because you can detect a language doesn't mean you know how to handle it differently... so also have an optional catchall that handles all languages not specifically mapped. {quote} Both of these are good reasons why we must avoid 639-1. We should be able to use things like macrolanguages and undetermined language. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2266) java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord()
[ https://issues.apache.org/jira/browse/SOLR-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967022#action_12967022 ] Yonik Seeley commented on SOLR-2266: OK, here's my guess: it's probably due to multiple indexed values per field value. ord/rord uses the StringIndex to get the ord values, which can't handle multiple indexed tokens per field value. The tdate type has a precisionStep 0, meaning it will index multiple values per field value to speed up range queries. If you don't need faster range queries on this type, then use date instead of tdate. But the ideal fix here is to eliminate the use of ord/rord since they also use up more memory... sorting by created will instantiate a per-segment long[] FieldCache entry. It would be nice if that could be reused for the function queries too. This is the case if you use ms(). http://wiki.apache.org/solr/FunctionQuery#ms java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord() Key: SOLR-2266 URL: https://issues.apache.org/jira/browse/SOLR-2266 Project: Solr Issue Type: Bug Affects Versions: 1.4.1 Environment: Mac OS 10.6 java version 1.6.0_22 Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode) Reporter: Peter Wolanin I have been testing a switch to long and tdate instead of int and date fields in the schema.xml for our Drupal integration. This indexes fine, but search fails with a 500 error. {code} INFO: [d7] webapp=/solr path=/select params={spellcheck=truefacet=truefacet.mincount=1indent=1spellcheck.q=termjson.nl=mapwt=jsonrows=10version=1.2fl=id,entity_id,entity,bundle,bundle_name,nid,title,comment_count,type,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=termbf=recip(rord(created),4,19,19)^200.0} status=500 QTime=4 Dec 5, 2010 11:52:28 AM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 39 at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692) at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:61) at org.apache.solr.search.function.TopValueSource.getValues(TopValueSource.java:57) at org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61) at org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123) at org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at
Re: Testing UpdateProcessorChain
On Dec 5, 2010, at 3:34 PM, Yonik Seeley wrote: On Sun, Dec 5, 2010 at 3:28 PM, Grant Ingersoll gsing...@apache.org wrote: Anyone have any thoughts on testing UpdateProcessorChain (and Factory). In looking at the Signature (dedup) tests, it seems a little clunky, yet the Solr base test class adoc (and related methods) don't seem to support specifying the Update handler to hit. You can specify an alternate update processor with any update command. SolrTestCaseJ4 has this: public static String add(XmlDoc doc, String... args) { so... you should be able to do something like add(doc(id,10),update.processor,foo) Yeah, I am calling that. I think the problem is that assertU() calls doLegacyUpdate, which doesn't handle getting the chain. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
what I am saying, is that this is a java project, and I don't want to write to some least common denominator/intersection of java and android. if an api doesn't exist in android, I could care less. inzstead, why can't interest parties have a little project where we port lucene java (perhaps a trivial patch), setup automated/tests etc. this I would be interested in, but let's keep lucene java as java On Dec 5, 2010 9:40 PM, Mark Miller markrmil...@gmail.com wrote: I have an interest - don't really care if it uses true java or not. I say keep it coming. Where/if it makes sense, why not make lucene work better with it. Perhaps that is not possible or too difficult in every case - but I'd still like to see the cases pop up. Better than those spam wiki update emails. - Mark On 12/5/10 3:36 PM, DM Smith wrote: Thanks Uwe (and others). We'll adapt. Is there any interest here in knowing if there are any other problems regarding Lucene on Android? From what I see, it is the first mobile platform on which Lucene can run. -- DM On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote: Hi DM, In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does not need the process ID anymore, so java.lang.management package is no longer used. In general, Lucene Java is compatible to the Java 5 SE specification. Android uses Harmony and therefore we cannot guarantee compatibility as Harmony is not TCK tested (but we do with latest versions, soon there will also be tests on Hudson with Harmony). But only latest versions of Harmony are really compatible with Lucene, previous versions fail lots of tests (ask Robert), and Android phones use very antique versions of Harmony - it is not even sure, that the Java5 Memory Model is correctly implemented in Dalvik! About 3.0.2: Of course this version even works with latest Harmony, so Harmony has java.lang.management package (which is java.lang!!!), so the bug is in Android, simply by excluding a SE package. So you should open bug report at Google and then hope that they fix it and all the phone manufacturers like Motor-Roller will update their Android versions. For your problem: The easy workaround is using Lucene 3.0.3 or simply use another LockFactory (Andoid is single user so even NoLockFactory would be fine in most cases). This are the same limitations like with the NFS filesystem. Just use FSDir.open(dir, lockFactory). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: DM Smith [mailto:dm-sm...@woh.rr.com] Sent: Sunday, December 05, 2010 12:16 AM To: dev@lucene.apache.org Subject: Exception in migrating from 2.9.x to 3.0.2 on Android The current code that works on Android with 2.9.1, but fails with 3.0.2: Directory dir = FSDirectory.open(file); ... do something with directory ... The error we're seeing is: 12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: java.lang.management.ManagementFactory 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock Factory.java:87) 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor y.java:142) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.store.Directory.makeLock(Directory.java:106) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058) Turns out Android does not have java.lang.management.ManagementFactory. There are several work arounds in client code, but not sure what is best. The bigger question is whether and how Lucene should be modified to accommodate? Ultimately FSDirectory.open does the following: if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Thanks, DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967032#action_12967032 ] Jan Høydahl commented on SOLR-1979: --- @Robert: Yes, there must be a way to tell whether or not the language even has a profile, through some well defined method. It's not important HOW we improve detection certainty, but comparing the top n distances could help. I'm also a fan of including other metrics than profile similarity if that can help, however for unique scripts that will automatically be covered by profile similarity. Detailed solution discussions should continue in TIKA-369. Macro languages: See TIKA-493 It makes sense to allow for detecting languages outside 639-1, and I believe RFC3066 and BCP47 are both re-using the 639 codes, so that if there is a 2-letter code for a language it will be used. 639-1 is what everyone already knows. In general, improvements should be done in Tika space, then use those in Solr, thus building one strong language detection library. @Grant: I actually planned to do the regEx based field name mapping in a separate UpdateProcessor, to make things more flexible. Example: {code:xml} processor class=org.apache.solr.update.processor.LanguageFieldMapperUpdateProcessor str name=languageFieldlanguage/str str name=fromRegEx(.*?)_lang/str str name=toRegEx$1_$lang/str str name=notSupportedLanguageToRegEx$1_t/str str name=supportedLanguagesde,en,fr,it,es,nl/str /processor {code} Your thought of allowing to detect language for individual fields in one go is also interesting. I'd love to see metadata support in SolrInputDocument, so that one processor could annotate a @language on the fields analyzed. Then next processor could act on metadata to rename field... @Yonik: By allowing regex naming of field names, we give users a generic tool to avoid field name clashes, by picking the pattern.. Mapping multiple languages to same suffix also makes sense. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1048) Ids parameter and fl=score throws an exception for wt=json
[ https://issues.apache.org/jira/browse/SOLR-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967035#action_12967035 ] Jon Bodner commented on SOLR-1048: -- The issue is still present in the 1.4.1 code base for Solr. I found the source of the problem. In the ids stage for sharding, the score is not calculated (it was returned in the previous stage), so the DocSlice's scores float array is still null. XMLWriter and BinaryResponseWriter include lines like: includeScore = includeScore ids.hasScores(); but JSONWriter does not. This issue is only going to present itself when you are debugging, since I think the ids parameter is only used for sharding, and Solr uses the javabin wire protocol instead of json. Ids parameter and fl=score throws an exception for wt=json -- Key: SOLR-1048 URL: https://issues.apache.org/jira/browse/SOLR-1048 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Laurent Chavet http://yourHost:8080/solr/select/?ids=YourDocIdversion=2.2start=0rows=10indent=onfl=score,idq=%2B*:* shows that when using ids= the score for docs is null; when using wt=json: http://yourHost:8080/solr/select/?ids=YourDocIdversion=2.2start=0rows=10indent=onfl=score,idq=%2B*:*wt=json that throws a NullPointerException: HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.search.DocSlice$1.score(DocSlice.java:120) at org.apache.solr.request.JSONWriter.writeDocList(JSONResponseWriter.java:490) at org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:140) at org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175) at org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288) at org.apache.solr.request.JSONWriter.writeResponse(JSONResponseWriter.java:88) at org.apache.solr.request.JSONResponseWriter.write(JSONResponseWriter.java:49) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:847) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
On 12/5/10 5:05 PM, Robert Muir wrote: what I am saying, is that this is a java project, and I don't want to write to some least common denominator/intersection of java and android. So don't - DM submitting cases that don't work and you not giving a shit are not mutually exclusive. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
FieldCache usage for custom field collapse in solr 1.4
Hey, I'm trying to use the lucene FieldCache for some custom field collapsing implementation: basically i'm collapsing on a non-stored field, and so am using the fieldcache to retrieve field value instances during run. I noticed I'm getting some OOM's after deploying it, and after looking into it for abit, figured that it might be to do with using a call like this: StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader, collapseField); where 'reader' is the instance of the SolrIndexReader passed along to the component with the ResponseBuilder.SolrQueryRequest object. As I understand, this can double memory usage due to (re)loading this fieldcache on a reader-wide basis rather than on a per segment basis? If so, what would be a way to migrate this code to use a per segment cache? i'm not sure I understand the semantics there at all... Any help will be greatly appreciated, thanks alot! Adam
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
On Sun, Dec 5, 2010 at 6:10 PM, Mark Miller markrmil...@gmail.com wrote: On 12/5/10 5:05 PM, Robert Muir wrote: what I am saying, is that this is a java project, and I don't want to write to some least common denominator/intersection of java and android. So don't - DM submitting cases that don't work and you not giving a shit are not mutually exclusive. Just trying to say, i dont think we should change the programming language of the project without a proper vote. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
bq. Perhaps that is not possible or too difficult in every case To clarify - sounds like I could be saying, well, perhaps we can't improve every case, but some we can. I'm saying *too difficult in every* case - even if we don't try and *fix a single case* - its still beneficial for you to report and discuss these issues IMO. And as I said, I'll remain interested. - Mark On 12/5/10 3:40 PM, Mark Miller wrote: I have an interest - don't really care if it uses true java or not. I say keep it coming. Where/if it makes sense, why not make lucene work better with it. Perhaps that is not possible or too difficult in every case - but I'd still like to see the cases pop up. Better than those spam wiki update emails. - Mark On 12/5/10 3:36 PM, DM Smith wrote: Thanks Uwe (and others). We'll adapt. Is there any interest here in knowing if there are any other problems regarding Lucene on Android? From what I see, it is the first mobile platform on which Lucene can run. -- DM On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote: Hi DM, In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does not need the process ID anymore, so java.lang.management package is no longer used. In general, Lucene Java is compatible to the Java 5 SE specification. Android uses Harmony and therefore we cannot guarantee compatibility as Harmony is not TCK tested (but we do with latest versions, soon there will also be tests on Hudson with Harmony). But only latest versions of Harmony are really compatible with Lucene, previous versions fail lots of tests (ask Robert), and Android phones use very antique versions of Harmony - it is not even sure, that the Java5 Memory Model is correctly implemented in Dalvik! About 3.0.2: Of course this version even works with latest Harmony, so Harmony has java.lang.management package (which is java.lang!!!), so the bug is in Android, simply by excluding a SE package. So you should open bug report at Google and then hope that they fix it and all the phone manufacturers like Motor-Roller will update their Android versions. For your problem: The easy workaround is using Lucene 3.0.3 or simply use another LockFactory (Andoid is single user so even NoLockFactory would be fine in most cases). This are the same limitations like with the NFS filesystem. Just use FSDir.open(dir, lockFactory). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: DM Smith [mailto:dm-sm...@woh.rr.com] Sent: Sunday, December 05, 2010 12:16 AM To: dev@lucene.apache.org Subject: Exception in migrating from 2.9.x to 3.0.2 on Android The current code that works on Android with 2.9.1, but fails with 3.0.2: Directory dir = FSDirectory.open(file); ... do something with directory ... The error we're seeing is: 12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: java.lang.management.ManagementFactory 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock Factory.java:87) 12-04 21:34:41.639: WARN/System.err(23160): at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor y.java:142) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.store.Directory.makeLock(Directory.java:106) 12-04 21:34:41.649: WARN/System.err(23160): at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058) Turns out Android does not have java.lang.management.ManagementFactory. There are several work arounds in client code, but not sure what is best. The bigger question is whether and how Lucene should be modified to accommodate? Ultimately FSDirectory.open does the following: if (Constants.WINDOWS) { return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } Should Android be a supported client OS? If so, wouldn't it be better not to have OS specific if-then-else and use reflection or something else? Thanks, DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
On 12/5/10 6:15 PM, Robert Muir wrote: On Sun, Dec 5, 2010 at 6:10 PM, Mark Millermarkrmil...@gmail.com wrote: On 12/5/10 5:05 PM, Robert Muir wrote: what I am saying, is that this is a java project, and I don't want to write to some least common denominator/intersection of java and android. So don't - DM submitting cases that don't work and you not giving a shit are not mutually exclusive. Just trying to say, i dont think we should change the programming language of the project without a proper vote. Then your just overreacting again. Allow me to sum up for you: DM: hey, we are trying to use lucene on android - this is not working Uwe and someone: thats not real java, we don't support it Rmuir : **%$$!! (kidding - i dont remember what you said) DM: Oh, pardon me. Well okay - but would anyone be interested in us reporting what doesn't work as we go through this? Android is the only mobile platform lucene works on I think. Mark: Oh yeah - interesting - please do. I'd be interested in seeing. Rmuir: don't change the lucene impl language without a vote! Gr! Mark: ?? Native Police: why are you so aggressive? - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967046#action_12967046 ] Grant Ingersoll commented on SOLR-1979: --- bq. @Grant: I actually planned to do the regEx based field name mapping in a separate UpdateProcessor, to make things more flexible I don't really see that it makes it any more flexible. If it was a general purpose mapper, maybe, but since it is tied to the language field, why not just put in the language processor? I've already got the method that choose the output field as a protected. With that, one merely would need to extend it to provide an alternate method from what you have proposed. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1979: -- Attachment: SOLR-1979.patch Here's a patch that passes the tests. Note, I modified the Solr base test case to have some new methods to properly call update handlers and then validate the results. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967048#action_12967048 ] Grant Ingersoll commented on SOLR-1979: --- Note, the patch still needs more tests and needs to check headers, etc. as well as the better field mapping and the proper language support that Robert is talking about. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap
[ https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967057#action_12967057 ] Nick Pellow commented on LUCENE-2235: - I just upgraded to 3.0.3 and we started getting NullPointerExceptions coming from PerFieldAnalyzerWrapper. We have a PerFieldAnalyzerWrapper that has a null defaultAnalyzer: {code} private final PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(null); {code} We add analyzers to all fields that are analyzed. ie: field.isAnalyzed() == true. getOffsetGap on PerFieldAnalyzerWrapper is being called, even for these non-analyzed fields. Is this expected behaviour? Lines 200-203 of DocInverterPerField are: {code} if (anyToken) fieldState.offset += docState.analyzer.getOffsetGap(field); fieldState.boost *= field.getBoost(); } {code} Should this be checking that a field is indeed analyzed before calling getOffsetGap ? implement PerFieldAnalyzerWrapper.getOffsetGap -- Key: LUCENE-2235 URL: https://issues.apache.org/jira/browse/LUCENE-2235 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Environment: Any Reporter: Javier Godoy Assignee: Uwe Schindler Priority: Minor Fix For: 2.9.4, 3.0.3, 3.1, 4.0 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), instead it returns the default values from the implementation of Analyzer. (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement getPositionIncrementGap) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2599) Deprecate Spatial Contrib
[ https://issues.apache.org/jira/browse/LUCENE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967064#action_12967064 ] Chris Male commented on LUCENE-2599: I just noticed that Solr depends upon some methods in DistanceUtils. We'll need to move that into the module before removing the contrib from 4x. Deprecate Spatial Contrib - Key: LUCENE-2599 URL: https://issues.apache.org/jira/browse/LUCENE-2599 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Affects Versions: 4.0 Reporter: Chris Male Attachments: LUCENE-2599.patch, LUCENE-2599.patch The spatial contrib is blighted by bugs. The latest series, found by Grant and discussed [here|http://search.lucidimagination.com/search/document/c32e81783642df47/spatial_rethinking_cartesian_tiers_implementation] shows that we need to re-think the cartesian tier implementation. Given the need to create a spatial module containing code taken from both lucene and Solr, it makes sense to deprecate the spatial contrib, and start from scratch in the new module. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception in migrating from 2.9.x to 3.0.2 on Android
On Sun, Dec 5, 2010 at 9:12 PM, Simon Willnauer simon.willna...@googlemail.com wrote: I personally consider android a valid platform for lucene and we should try to reduce the pain for android folks as much as possible. Changing supported platforms is a totally different thing to me. good, you can start a separate subproject as a port then. but until then, android isnt supported by lucene-java. android is a different programming language, and by supporting it, we change the programming language of the lucene-java project. this requires a vote, until then, its not supported by definition since our documented programming language is java, not android. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967076#action_12967076 ] Robert Muir commented on SOLR-1979: --- {quote} It makes sense to allow for detecting languages outside 639-1, and I believe RFC3066 and BCP47 are both re-using the 639 codes, so that if there is a 2-letter code for a language it will be used. 639-1 is what everyone already knows. In general, improvements should be done in Tika space, then use those in Solr, thus building one strong language detection library. {quote} yes they do, the 639-1 codes that tika outputs are also valid BCP47 codes :) but in solr, when designing up front, i was just saying we shouldn't limit any abstract portion to 639-1 when another implementation might support 3066 or BCP47... we should make sure we allow that. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
Hi Steven, Yep, like you state below JIRA *could* be configured to deal with this. In all honesty, putting tons of thought and effort into how to precisely deal with the changes you specify below might be somewhat overkill. Cheers, Chris On Dec 5, 2010, at 12:17 PM, Steven A Rowe wrote: On 12/5/2010 at 12:19 PM, Robert Muir wrote: On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Mark, RE: the credit system. JIRA provides a contribution report here, like this one that I generated for Lucene 3.1: My concern with this is that it leaves out important email contributors. I agree, this is a serious problem. My additional problems with JIRA-generated changes: 1. Huge undifferentiated change lists are frightening and nearly useless, regardless of the quality of the descriptions. JIRA's issue types are: Bug, New Feature, Improvement, Test, Wish, Task Even if we used JIRA's issue types to group issues, they are not the same as Lucene's CHANGES.txt issue types: Changes in backwards compatibility policy, Changes in runtime behavior, API Changes, Documentation, Bug fixes, New features, Optimizations, Build, Test Cases, Infrastructure (I left out Requirements, last used in 2006 under release 1.9 RC1, since Build seems to have replaced it.) 2. There are now four separate CHANGES.txt files in the Lucene code base, excluding Solr and its modules (each of which has one of them). This number will only grow as more Lucene contribs become modules. The JIRA project components list is outdated / incomplete / has different granularity than the CHANGES.txt locations, so using it to group JIRA issues would not work because they don't align with Lucene/Solr components. 3. Some of the CHANGES.txt entries draw from multiple JIRA issues. From dev/trunk/lucene/CHANGES.txt: Trunk: 9 out of 56 include multiple JIRA issues 3.X: 7/94 3.0.0: 3/29 2.9.0: 9/153 I'm assuming a JIRA dump can't do this. 4. Some JIRA issues appear under multiple change categories in CHANGES.txt. From dev/trunk/lucene/CHANGES.txt: Trunk: 3 out of 68 multiply categorized 3.X: 9/102 3.0.0: 1/53 2.9.0: 20/166 A JIRA dump would not allow for multiple issue categorization, since JIRA only allows a single issue type to be assigned - I guess they are assumed to be mutually exclusive. Maybe our use of JIRA could be changed to address some of these problems, through addition of new fields and/or modification of existing fields' allowable values? Steve ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Changes Mess
Hi Chris, On 12/5/2010 at 10:36 PM, Chris Mattman wrote: Yep, like you state below JIRA *could* be configured to deal with this. In all honesty, putting tons of thought and effort into how to precisely deal with the changes you specify below might be somewhat overkill. I think dumping CHANGES.txt in favor of output from a badly misconfigured issue tracking system would be foolish. One way to deal with the problem is to stay with CHANGES.txt. (We've been down this road before, and this is where we landed in the past.) Another would be to fix the issue tracking system. Yet another way would be to declare the problem non-existent and screw our users by insulting them with a honking great mass of changes without any indication about what they are or how they are inter-related. (You won't be surprised at this point, I think, by my -1 to this.) Steve
Re: Changes Mess
Yet another way would be to declare the problem non-existent and screw our users by insulting them with a honking great mass of changes without any indication about what they are or how they are inter-related. (You won't be surprised at this point, I think, by my -1 to this.) Right, I'm one of those users (have been in the past and am somewhat still) as well as a former member of the PMC and so acting like I'm suggesting screwing them over (them which would include me) by simply suggesting that solving this mess in completeness is intractable so you just have to go with a heuristic (which I'd argue spending oodles of time on isn't worth it) is also a bit insulting. I suggested that JIRA can handle this. We're using it in oh, about 2-3 Apache projects I'm on and it's working great. If you think it's a mess for all the stuff you put in the email, great, that's your prerogative. I'm just saying in my experience it hasn't been that bad. Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967086#action_12967086 ] JohnWu commented on SOLR-1395: -- TomLiu: I still jam in the query dipatch to subproxy! SEVERE: Error calling public abstract org.apache.solr.katta.KattaResponse org.apache.solr.katta.ISolrServer.request(java.lang.String[],org.apache.solr.katta.KattaRequest) throws java.lang.Exception on pc-slave02:2 (try # 1 of 3) (id=0) java.lang.reflect.InvocationTargetException so, I give you my config in proxy, please review them: in proxy 1) solrHome- solrconfig.xml config requestHandler name=standard class=solr.KattaRequestHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=shards*/str /lst /requestHandler /config ok, all the shard is watched and hold in Zookeeper, through zookeeper zkCli.sh [zk: pc-master(CONNECTED) 11] ls /katta/shard-to-nodes [SPIndex05#1287138886138-99384445, SPIndex04#1287138886138-99384445] 2) In proxy katta.node.properties: node.server.class=net.sf.katta.lib.lucene.LuceneServer 3) query: http://localhost:8080/solr-1395-katta-0.6.2-2patch/select/?q=lovealiceversion=2.2start=0rows=10indent=onisShard=falsedistrib=true is that right? especial in this step 2, Thanks! JohnWu Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory
[ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967100#action_12967100 ] Shai Erera commented on LUCENE-2471: At some point IndexInput/Output.copyBytes did use FileChannel optimization in FSDirectory, but that caused troubles I think when the copying thread was interrupted. So it was removed and we were left w/ the default impl. Supporting bulk copies in Directory --- Key: LUCENE-2471 URL: https://issues.apache.org/jira/browse/LUCENE-2471 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Earwin Burrfoot Fix For: 3.1, 4.0 A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source. This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2800) Search Index Generation fails
Search Index Generation fails - Key: LUCENE-2800 URL: https://issues.apache.org/jira/browse/LUCENE-2800 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.0.0 Environment: Windows Server 2003 Reporter: Sunitha Belavagi Hi, We are using lucene 2.0.0 for search index In our Comergent application It was working fine since from more than 3 years. From this week, it is throwing Exception while creating New Index and also for Incremental Index. Below is the exception com.comergent.api.appservices.productService.ProductServiceException: java.io.IOException: Cannot delete ...\searchIndex\en_US\MasterIndex_602580\segments at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.indexPCFromCache(CatalogIndexSetBuilder.java:634) at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.buildIndexSet(CatalogIndexSetBuilder.java:276) at com.comergent.appservices.search.indexBuilder.IndexSetBuilder$BuilderThread.run(IndexSetBuilder.java:469) Caused by: java.io.IOException: Cannot delete searchIndex\en_US\MasterIndex_602580\segments at org.apache.lucene.store.FSDirectory.renameFile(FSDirectory.java:268) at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:95) at org.apache.lucene.index.IndexWriter$4.doBody(IndexWriter.java:726) at org.apache.lucene.store.Lock$With.run(Lock.java:99) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:724) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:674) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:479) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:462) at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.indexPCFromCache(CatalogIndexSetBuilder.java:630) ... 2 more 2010.12.05 06:25:13:532 Env/Thread-21961:ERROR:CatalogIndexSetBuilder CatalogIndexSetBuilder: [MasterIndex_602580] - Exception: com.comergent.api.appservices.productService.ProductServiceException: java.io.IOException: Cannot delete ...\MasterIndex_602580\segments 2010.12.05 06:25:13:532 Env/Thread-21961:INFO:CMGT_SEARCH IndexSetBuilder$BuilderThread: error building the index for: MasterIndex_602580 com.comergent.api.exception.ComergentException: com.comergent.api.appservices.productService.ProductServiceException: java.io.IOException: Cannot delete \searchIndex\en_US\MasterIndex_602580\segments at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.buildIndexSet(CatalogIndexSetBuilder.java:305) at com.comergent.appservices.search.indexBuilder.IndexSetBuilder$BuilderThread.run(IndexSetBuilder.java:469) Caused by: com.comergent.api.appservices.productService.ProductServiceException: java.io.IOException: Cannot delete ...\MasterIndex_602580\segments at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.indexPCFromCache(CatalogIndexSetBuilder.java:634) at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.buildIndexSet(CatalogIndexSetBuilder.java:276) ... 1 more Caused by: java.io.IOException: Cannot delete ...\MasterIndex_602580\segments at org.apache.lucene.store.FSDirectory.renameFile(FSDirectory.java:268) at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:95) at org.apache.lucene.index.IndexWriter$4.doBody(IndexWriter.java:726) at org.apache.lucene.store.Lock$With.run(Lock.java:99) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:724) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:674) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:479) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:462) at com.comergent.reference.appservices.productService.search.indexBuilder.CatalogIndexSetBuilder.indexPCFromCache(CatalogIndexSetBuilder.java:630) ... 2 more 2010.12.05 06:25:13:938 Env/http-8080-Processor75:INFO:CMGT_SEARCH IndexSetBuilder: error building the index: com.comergent.api.appservices.search.exception.IndexingException: Error in executing some builder threads... at com.comergent.appservices.search.indexBuilder.IndexSetBuilder.monitor(IndexSetBuilder.java:440) at com.comergent.appservices.search.indexBuilder.IndexSetBuilder.build(IndexSetBuilder.java:185) at
[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap
[ https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967115#action_12967115 ] Uwe Schindler commented on LUCENE-2235: --- Hi Nick, thanks for reporting this. Your problem only occurs since the missing method was added (before PFAW only returned some default, now it throws NPE) in that case. In general, Lucene does not support *null* analyzers anywhere (not as ctor argument in IW/IWC) or e.g. here. You should always add a simple analyzer to IndexWriter (WhitespaceAnalyzer, SimpleAnalyzer, KeywordAnalyzer) or other methods taking Analyzer. To really fix this, we have to review all places that don't need to call Analyzers. There are e.g. other places, like when you directly pass the TokenStream to the Field with new Field(name, TokenStream), it also calls the analyzer, so you have to implement it. implement PerFieldAnalyzerWrapper.getOffsetGap -- Key: LUCENE-2235 URL: https://issues.apache.org/jira/browse/LUCENE-2235 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Environment: Any Reporter: Javier Godoy Assignee: Uwe Schindler Priority: Minor Fix For: 2.9.4, 3.0.3, 3.1, 4.0 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), instead it returns the default values from the implementation of Analyzer. (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement getPositionIncrementGap) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967118#action_12967118 ] tom liu commented on SOLR-1395: --- in proxy: katta.node.properties: #node.server.class=net.sf.katta.lib.lucene.LuceneServer node.server.class=org.apache.solr.katta.DeployableSolrKattaServer you must put apache-solr-core-XXX.jar to katta's lib, and some relative jars. Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org