[jira] [Commented] (LUCENE-4669) Document wrongly deleted from index
[ https://issues.apache.org/jira/browse/LUCENE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549495#comment-13549495 ] Adrien Grand commented on LUCENE-4669: -- Hi Miguel, bq. One more question: what's the best way to iterate over all documents in an index? Retrieving stored fields for all documents in an index is something Lucene is bad at (it doesn't optimize for this use-case on purpose), and you should try to avoid doing it. Otherwise, iterating over all doc ids from 0 to ir.maxDoc(), skipping deleted documents (liveDocs != null !liveDocs.get(docID)) and calling IndexReader.document(docID) should work. Please ask questions on the user mailing-list instead of JIRA in the future. Document wrongly deleted from index --- Key: LUCENE-4669 URL: https://issues.apache.org/jira/browse/LUCENE-4669 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0 Environment: OS = Mac OS X 10.7.5 Java = JVM 1.6 Reporter: Miguel Ferreira I'm trying to implement document deletion from an index. If I create an index with three documents (A, B and C) and then try to delete A, A gets marked as deleted but C is removed from the index. I've tried this with different number of documents and saw that it is always the last document that is removed. When I run the example unit test code bellow I get this output: {code} Before delete Found 3 documents Document at = 0; isDeleted = false; path = a; Document at = 1; isDeleted = false; path = b; Document at = 2; isDeleted = false; path = c; After delete Found 2 documents Document at = 0; isDeleted = true; path = a; Document at = 1; isDeleted = false; path = b; {code} Example unit test: {code:title=ExampleUnitTest.java} @Test public void delete() throws Exception { File indexDir = FileUtils.createTempDir(); IndexWriter writer = new IndexWriter(new NIOFSDirectory(indexDir), new IndexWriterConfig(Version.LUCENE_40, new StandardAnalyzer(Version.LUCENE_40))); Document doc = new Document(); String fieldName = path; doc.add(new StringField(fieldName, a, Store.YES)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(fieldName, b, Store.YES)); writer.addDocument(doc); doc = new Document(); doc.add(new StringField(fieldName, c, Store.YES)); writer.addDocument(doc); writer.commit(); System.out.println(Before delete); print(indexDir); writer.deleteDocuments(new Term(fieldName, a)); writer.commit(); System.out.println(After delete); print(indexDir); } public static void print(File indexDirectory) throws IOException { DirectoryReader reader = DirectoryReader.open(new NIOFSDirectory(indexDirectory)); Bits liveDocs = MultiFields.getLiveDocs(reader); int numDocs = reader.numDocs(); System.out.println(Found + numDocs + documents); for (int i = 0; i numDocs; i++) { Document document = reader.document(i); StringBuffer sb = new StringBuffer(); sb.append(Document at = ).append(i); sb.append(; isDeleted = ).append(liveDocs != null ? !liveDocs.get(i) : false).append(; ); for (IndexableField field : document.getFields()) { String fieldName = field.name(); for (String value : document.getValues(fieldName)) { sb.append(fieldName).append( = ).append(value).append(; ); } } System.out.println(sb.toString()); } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 57 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/57/ Java: 64bit/jdk1.6.0 -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 24615 lines...] BUILD FAILED /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-4.x-MacOSX/build.xml:60: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-4.x-MacOSX/lucene/build.xml:310: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-4.x-MacOSX/lucene/common-build.xml:1920: javax.script.ScriptException: javax.script.ScriptException: org.parboiled.errors.ParserRuntimeException: Error while parsing action 'Root/Sequence/ZeroOrMore/Sequence/Block/FirstOf/Heading/FirstOf/AtxHeading/OneOrMore/Sequence/AtxInline/Inline/Inline_Action1' at input position (line 1, pos 3): # JRE Version Migration Guide ^ org.pegdown.ParsingTimeoutException at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:138) at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.util.ReflectUtil.invoke(ReflectUtil.java:108) at org.apache.tools.ant.util.ReflectWrapper.invoke(ReflectWrapper.java:81) at org.apache.tools.ant.util.optional.JavaxScriptRunner.evaluateScript(JavaxScriptRunner.java:103) at org.apache.tools.ant.util.optional.JavaxScriptRunner.executeScript(JavaxScriptRunner.java:67) at org.apache.tools.ant.types.optional.ScriptFilter.filter(ScriptFilter.java:110) at org.apache.tools.ant.filters.TokenFilter.read(TokenFilter.java:114) at org.apache.tools.ant.filters.BaseFilterReader.read(BaseFilterReader.java:83) at java.io.BufferedReader.read1(BufferedReader.java:185) at java.io.BufferedReader.read(BufferedReader.java:261) at org.apache.tools.ant.util.ResourceUtils.copyResource(ResourceUtils.java:494) at org.apache.tools.ant.util.FileUtils.copyFile(FileUtils.java:559) at org.apache.tools.ant.taskdefs.Copy.doFileOperations(Copy.java:875) at org.apache.tools.ant.taskdefs.Copy.execute(Copy.java:549) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:398) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1251) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302) at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
[jira] [Created] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
Adrien Grand created LUCENE-4674: Summary: Consistently set offset=0 in BytesRef.copyBytes Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4674: - Attachment: LUCENE-4674.patch Patch. Additionally I added a call to ArrayUtil.oversize to make resizing less likely. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549533#comment-13549533 ] Robert Muir commented on LUCENE-4674: - I dont really agree (i dont think this class should be treated like stringbuffer). changing offset to 0 is fine when we make a new array: otherwise it is definitely and 100% certainly NOT OK as we may overwrite unrelated data. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549534#comment-13549534 ] Robert Muir commented on LUCENE-4674: - moreover, any proposed changes here should also include the changes to IntsRef, LongsRef, CharsRef, and so on before even being considered. Otherwise the apis just get out of wack. Maybe we should just seriously consider just switching to java.nio.Buffer. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549536#comment-13549536 ] Uwe Schindler commented on LUCENE-4674: --- I agree with Robert. We had BytesRef and CharsRef in the past doing that stuff. But as the name of the class is *Ref not *Buffer, it should only hold a reference to a byte[] and not change it or grow it. Esspecially it should not change offset. This is risky, if you get a BytesRef that points to some slice in a larger buffer and you suddenly resize it, invalidating content that might be needed by other stuff (e.g. while iterating terms, the previous/next term gets corrupted). I would in any case favour to use ByteBuffer instead of this unsafe and inncomplete duplicate. BytesRef is for user-facing APIs a mess. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4620: --- Attachment: LUCENE-4620.patch Patch makes the following changes: * {{IntEncoder.encode()}} takes an {{IntsRef}} and {{BytesRef}} and encodes the integers from {{IntsRef}} to {{BytesRef}}. Similarily, {{IntDecoder.decode()}} takes a {{BytesRef}} and {{IntsRef}} and decodes the integers from the byte array to the integer array. * {{CategoryListIterator}} and {{Aggregator}} were changed to do bulk handling of category ordinals as well. * In the process I merged some methods such as {{PayloadIterator.setdoc}} and {{PayloadIterator.getPayload}}, as well as {{AssociationsPayloadIterator}}, to reduce even further the number of method calls that happen during search. * Added a test which tests MultiCategoryListIterator (we didn't have one!) and improved EncodingTest to test a large number of random values. All tests pass, and 'ant javadocs' passes too. Explore IntEncoder/Decoder bulk API --- Key: LUCENE-4620 URL: https://issues.apache.org/jira/browse/LUCENE-4620 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-4620.patch Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV. Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go. One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API. At the end of the day, the requirement is for someone to be able to configure how ordinals are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with as little overhead as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549538#comment-13549538 ] Adrien Grand commented on LUCENE-4674: -- I still find confusing that we are allowed to write past offset + length but not before offset. Switching to the java.nio buffers sounds good. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549542#comment-13549542 ] Shai Erera commented on LUCENE-4674: I recently (LUCENE-4620) moved some facets code to use BytesRef and IntsRef and found these two classes very convenient. The only thing that I found missing is a *Ref.upto. E.g., I first made the mistake {{for (int i = bytes.offset; i bytes.length; i++)}}, where the correct form is {{for (int i = bytes.offset; i bytes.length + bytes.offset; i++)}} (but then you need to do that '+' at every iteration, or extract it to a variable). I considered using BytesBuffer instead, but as long as e.g. a Payload is represented as a BytesRef, it's a waste to always ByteBuffer.wrap(BytesRef.bytes, offset, length). I used BytesRef as it was very convenient (and if we add an 'upto' index to them, that'd even be greater :)). I agree that grow() is currently risky, as it may override some data that is used by another thread (as a slice to the buffer). But that can be solved with proper documentation I think. I also agree that we should not set offset to 0. I did that, and MemoryCodec got upset :). For all practical purposes, apps should treat offset and length as final (we should not make them final though, just document it). If an app messes with them, it should better know what it's doing. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549543#comment-13549543 ] Robert Muir commented on LUCENE-4674: - the whole class is confusing. but the problem with this proposed change is very simple: BytesRef a = new BytesRef(bigbyte, 0, 5); BytesRef b = new BytesRef(bigbyte, 5, 10); b.copy(someOtherStuff...) should *NOT* muck with a. A is unrelated to B. I think realistically we should avoid methods like append/copy alltogether as they encourage more stringbuffer-type use like this. if you want a stringbuffer-type class, it can safely support methods like this, but then it should *own the array* (make a copy). Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549544#comment-13549544 ] Adrien Grand commented on LUCENE-4674: -- bq. b.copy(someOtherStuff...) should NOT muck with a. Unfortunately a.copy(otherStuff) will modify b if otherStuff.length 5. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549554#comment-13549554 ] Robert Muir commented on LUCENE-4674: - I will open a new issue to remove all write methods from bytesref. this is a ref class, not a stringbuilder. we have to keep these apis contained. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier
[ https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-4670. -- Resolution: Fixed Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier -- Key: LUCENE-4670 URL: https://issues.apache.org/jira/browse/LUCENE-4670 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch This is especially useful to LUCENE-4599 where actions have to be taken after a doc/field/term has been added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-3178) Native MMapDir
Haven't run across play up in this context (I as raised on the wrong side of the Atlantic), but three definitions I found _all_ apply: 1 *Brit* *informal* to behave irritatingly (towards) 2 *(intr)* *Brit* *informal* (of a machine, car, etc.) to function erratically *3 * *Brit* *informal* to hurt; give (one) pain or trouble Don't think I've found another two-word phrase that packs that many varieties of how computers are mean to me in so efficiently. Gotta add that one to my vocabulary On Wed, Jan 9, 2013 at 2:40 PM, Greg Bowyer (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548885#comment-13548885] Greg Bowyer commented on LUCENE-3178: - Frustrating, it echos what I have been seeing so at least my benchmarking is not playing me up, I guess I will have to do some digging. Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier
[ https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549562#comment-13549562 ] Commit Tag Bot commented on LUCENE-4670: [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revisionrevision=1431294 LUCENE-4670: Add finish* callbacks to StoredFieldsWriter and TermVectorsWriter. Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier -- Key: LUCENE-4670 URL: https://issues.apache.org/jira/browse/LUCENE-4670 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch This is especially useful to LUCENE-4599 where actions have to be taken after a doc/field/term has been added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-4112) Dataimporting with SolrCloud Fails
Sausarkar: When you say the index went from 14G to 7G, did you notice whether the difference was tin the *.fdt and *.fdx files? That would be due to compression of stored fields which is now the default If you could, would you let us know the sizes of the files with those two extensions before after? I'm trying to gather real-world examples... But about your slowdown, does the same thing happen if you specify fl=score (and insure that lazy load is configured in solrconfig.xml)? I don't think that would be reading the fields off disk and decompressing them... what are you measuring? Total time to return to the client? It'd also help pin this down if you looked just at QTime in the responses, that should be exclusive of time to assemble the documents, it's purely searching. Thanks, Erick On Wed, Jan 9, 2013 at 8:50 PM, sausarkar sausar...@ebay.com wrote: We are using solr-meter for generating query load of around 110 Queries per second per node. With 4.1 with the average query time is 300 msec if we switch to 4.0 the average query time is around 11 msec. We used the same load test params and same 10 million records, only differences are the version and index files, 4.1 has 7GB and 4.0 has 14GB. -- View this message in context: http://lucene.472066.n3.nabble.com/jira-Created-SOLR-4112-Dataimporting-with-SolrCloud-Fails-tp4022365p4032084.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549567#comment-13549567 ] Uwe Schindler commented on LUCENE-3178: --- I think this is largely related to Robert's comment: bq. Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549568#comment-13549568 ] Michael McCandless commented on LUCENE-4620: Looks like there were some svn mv's, so the patch doesn't directly apply ... Can you regenerate the patch using 'svn diff --show-copies-as-adds' (assuming you're using svn 1.7+)? Either that or use dev-tools/scripts/diffSources.py ... thanks. Explore IntEncoder/Decoder bulk API --- Key: LUCENE-4620 URL: https://issues.apache.org/jira/browse/LUCENE-4620 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-4620.patch Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV. Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go. One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API. At the end of the day, the requirement is for someone to be able to configure how ordinals are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with as little overhead as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4620: --- Attachment: LUCENE-4620.patch Sorry. Can you try now? Explore IntEncoder/Decoder bulk API --- Key: LUCENE-4620 URL: https://issues.apache.org/jira/browse/LUCENE-4620 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-4620.patch, LUCENE-4620.patch Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV. Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go. One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API. At the end of the day, the requirement is for someone to be able to configure how ordinals are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with as little overhead as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4675) remove *Ref.copy/append/grow
Robert Muir created LUCENE-4675: --- Summary: remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4676) IndexReader.isCurrent race
Robert Muir created LUCENE-4676: --- Summary: IndexReader.isCurrent race Key: LUCENE-4676 URL: https://issues.apache.org/jira/browse/LUCENE-4676 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Revision: 1431169 ant test -Dtestcase=TestNRTManager -Dtests.method=testThreadStarvationNoDeleteNRTReader -Dtests.seed=925ECD106FBFA3FF -Dtests.slow=true -Dtests.locale=fr_CA -Dtests.timezone=America/Kentucky/Louisville -Dtests.file.encoding=US-ASCII -Dtests.dups=500 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549589#comment-13549589 ] Shai Erera commented on LUCENE-4675: I kinda like grow(). Will I be able to grow() the buffer from the outside if you remove it? I.e. will the byte[] not be final? remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549590#comment-13549590 ] Robert Muir commented on LUCENE-4675: - I'm proposing removing these 3 methods from BytesRef itself, thats all. The guy from the outside knows what he can do: he knows if the bytes actually point to a slice of a PagedBytes (grow is actually senseless here!), or just a simple byte[], or whatever. He doesn't need BytesRef itself to do these things. So he can then change the ref to point at a different slice, or different byte[] alltogether, or whatever. remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-4674. -- Resolution: Won't Fix Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549596#comment-13549596 ] Robert Muir commented on LUCENE-4674: - {quote} Unfortunately a.copy(otherStuff) will modify b if otherStuff.length 5. {quote} I still like the idea of fixing this myself (maybe Shai's idea?). i don't like this kind of dangerous stuff!! I ultimately think LUCENE-4675 is the next logical step, but can we remove this a.copy()-overwrites-b trap as an incremental improvement? thats a bug in my opinion. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549598#comment-13549598 ] Shai Erera commented on LUCENE-4675: ok. While you're at it, what do you think about adding an 'upto' member for easier iteration on the bytes/ints/chars? (see my comment on LUCENE-4674) remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549611#comment-13549611 ] Robert Muir commented on LUCENE-4675: - i dont think we need any additional members in this thing. what more does it need other than byte[], offset, length?! i want to remove the extraneous stuff. if you want to make an iterator, you can separately make your own BytesRefIterator class? remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549625#comment-13549625 ] Adrien Grand commented on LUCENE-4674: -- bq. I still like the idea of fixing this myself (maybe Shai's idea?). i don't like this kind of dangerous stuff!! The 'upto' idea or allocating a new byte[] if someOtherStuff offset + length this.offset + length? ? bq. I ultimately think LUCENE-4675 is the next logical step, but can we remove this a.copy()-overwrites-b trap as an incremental improvement? Regarding the idea to switch to the java.nio buffers, are there some traps besides backward compatibility? Should we start migrating our internal APIs to this API (and maybe even the public ones for 5.0?). Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549630#comment-13549630 ] Robert Muir commented on LUCENE-4674: - {quote} allocating a new byte[] if someOtherStuff offset + length this.offset + length? ? {quote} This, preventing a.copy(otherStuff) from overflowing onto b. I dont want any other functionality in this class. it needs less, not more. {quote} Regarding the idea to switch to the java.nio buffers, are there some traps besides backward compatibility? Should we start migrating our internal APIs to this API (and maybe even the public ones for 5.0?). {quote} I haven't even thought about it really. I actually am less concerned about our internal apis. Its the public ones i care about. I would care a lot less about BytesRef co if users werent forced to interact with them. Consistently set offset=0 in BytesRef.copyBytes --- Key: LUCENE-4674 URL: https://issues.apache.org/jira/browse/LUCENE-4674 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4674.patch BytesRef.copyBytes(BytesRef other) has two branches: - either the destination array is large enough and it will copy bytes after offset, - or it needs to resize and in that case it will set offset = 0. I think this method should always set offset = 0 for consistency, and to avoid resizing when other.length is larger than this.bytes.length - this.offset but smaller than this.bytes.length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549632#comment-13549632 ] Shai Erera commented on LUCENE-4675: bq. you can separately make your own BytesRefIterator class I can. I wanted to avoid additional object allocations, but such an Iterator class can have a reset(BytesRef) method which will update pos and upto members accordingly. I was thinking that an 'upto' index might be useful for others. For my purposes (see LUCENE-4620) I just use bytes.offset as 'pos' and compute an 'upto' and passes it along. I will think about the Iterator class though, perhaps it's not a bad idea. And maybe *Ref can have an iterator() method which returns the proper one ... or not. remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549637#comment-13549637 ] Robert Muir commented on LUCENE-4675: - I dont think we should add more functionality to these *Ref classes: they have too many traps and bugs already. Less is more here. remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 70 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/70/ Java: 64bit/jdk1.6.0 -XX:+UseSerialGC All tests passed Build Log: [...truncated 8383 lines...] [junit4:junit4] ERROR: JVM J0 ended with an exception, command line: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=BC09482A7937D842 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/testlogging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath
Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 70 - Failure!
JVM Crash: [junit4:junit4] Suite: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest [junit4:junit4] Completed in 32.12s, 1 test [junit4:junit4] [junit4:junit4] JVM J0: stdout was not empty, see: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20130110_132632_493.sysout [junit4:junit4] JVM J0: stdout (verbatim) [junit4:junit4] Invalid memory access of location 0x0 rip=0x7fff8f93db43 [junit4:junit4] JVM J0: EOF [junit4:junit4] Execution time total: 18 minutes 36 seconds On Thu, Jan 10, 2013 at 8:45 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/70/ Java: 64bit/jdk1.6.0 -XX:+UseSerialGC All tests passed Build Log: [...truncated 8383 lines...] [junit4:junit4] ERROR: JVM J0 ended with an exception, command line: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=BC09482A7937D842 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/testlogging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549651#comment-13549651 ] Varun Thacker commented on LUCENE-3354: --- Hi, I have a doubt on FieldCache supporting MultiValued fields in general. So FieldCache on a multiValued field works by consuming it from FieldCache.DocTermOrds but, * I was trying out FunctionQuery in Solr and still got a cannot FieldCache on multiValued field error. This is because any impl. of FieldCacheSource for example StrFieldSource#getValues() returns DocTermsIndexDocValues where FieldCache.DocTermsIndex instance loads up. Is this supposed to be consumed like this? * Secondly slightly off topic but I went through the lucene4547 branch where there was a discussion on how to consume DocValues. I'm still trying to figure a lot of stuff around DocValues, FieldCache etc. but do we need to discuss all these issues and it's impact on Solr and ES as a whole? Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Core Issue Type: Improvement Reporter: Bill Bell Fix For: 4.0-ALPHA Attachments: LUCENE-3354.patch, LUCENE-3354.patch, LUCENE-3354_testspeed.patch I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow
[ https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549654#comment-13549654 ] Uwe Schindler commented on LUCENE-4675: --- Strong +1 to make BytesRef a byte[] reference only. BytesRef is unfortunately a user-facing class in Lucene 4.x, so we have to look into this. I was also planning to fix this before 4.0, but we had no time. This was one of the last classes, Robert and I did not fix in the final cleanup before release, which is a pity. remove *Ref.copy/append/grow Key: LUCENE-4675 URL: https://issues.apache.org/jira/browse/LUCENE-4675 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir These methods are dangerous: In general if we want a StringBuilder type class, then it should own the array, and it can freely do allocation stuff etc. this is the only way to make it safe. Otherwise if we want a ByteBuffer type class, then its reference should be immutable (the byte[]/offset/length should be final), and it should not have allocation stuff. BytesRef is none of these, its like a C pointer. Unfortunately lucene puts these unsafe, dangerous, trappy APIs directly in front of the user. What happens if i have a bug in my application and it accidentally mucks with the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? Will this get merged into a corrupt index? I think as a start we should remove these copy/append/grow to minimize this closer to a ref class (e.g. more like java.lang.ref and less like stringbuilder). Nobody needs this stuff on bytesref, they can already operate on the bytes directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config
Yago Riveiro Rodríguez created SOLR-4292: Summary: After upload and link config collection, the collection in solrcloud not load the new config Key: SOLR-4292 URL: https://issues.apache.org/jira/browse/SOLR-4292 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: CentOS release 6.3 (Final) Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Yago Riveiro Rodríguez I'm trying to change the settings for a specific collection, which is empty, with a new config. The collection has 2 shards, and the zookeeper is a cluster of 3 servers. I used the zookeeper to upload the configuration and link it with the collection. After this, I reloaded the collection in both nodes (replica and leader) but when I try to see the STATUS of collection's core (/solr/admin/cores?action=STATUSwt=jsonindent=true) I get this error: ST-4A46DF1563_0812:org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper:statisticsBucket-aggregation-revision-1 The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the configname: {configName:statisticsBucket-aggregation-revision-1} If the zookeeper has the new config loaded and I linked the config to the collection, why the status of core says that the configuration is missing? /Yago -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4293) Solr throws an NPE when extracting update handled called with an empty document
Karl Wright created SOLR-4293: - Summary: Solr throws an NPE when extracting update handled called with an empty document Key: SOLR-4293 URL: https://issues.apache.org/jira/browse/SOLR-4293 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Karl Wright When you send an empty document to update/extract, you get this: {code} SEVERE: java.lang.NullPointerException at org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:164) at org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4293) Solr throws an NPE when extracting update handled called with an empty document
[ https://issues.apache.org/jira/browse/SOLR-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated SOLR-4293: -- Attachment: SOLR-4293.patch This patch should fix the problem. Solr throws an NPE when extracting update handled called with an empty document --- Key: SOLR-4293 URL: https://issues.apache.org/jira/browse/SOLR-4293 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Karl Wright Attachments: SOLR-4293.patch When you send an empty document to update/extract, you get this: {code} SEVERE: java.lang.NullPointerException at org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:164) at org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config
[ https://issues.apache.org/jira/browse/SOLR-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549712#comment-13549712 ] Yago Riveiro Rodríguez commented on SOLR-4292: -- My fault, I wrote the confname parameter incorrectly. btw the zookeeper's log is so verbose that the error hasn't visibility. After upload and link config collection, the collection in solrcloud not load the new config Key: SOLR-4292 URL: https://issues.apache.org/jira/browse/SOLR-4292 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: CentOS release 6.3 (Final) Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Yago Riveiro Rodríguez I'm trying to change the settings for a specific collection, which is empty, with a new config. The collection has 2 shards, and the zookeeper is a cluster of 3 servers. I used the zookeeper to upload the configuration and link it with the collection. After this, I reloaded the collection in both nodes (replica and leader) but when I try to see the STATUS of collection's core (/solr/admin/cores?action=STATUSwt=jsonindent=true) I get this error: ST-4A46DF1563_0812:org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper:statisticsBucket-aggregation-revision-1 The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the configname: {configName:statisticsBucket-aggregation-revision-1} If the zookeeper has the new config loaded and I linked the config to the collection, why the status of core says that the configuration is missing? /Yago -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config
[ https://issues.apache.org/jira/browse/SOLR-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yago Riveiro Rodríguez closed SOLR-4292. Resolution: Not A Problem After upload and link config collection, the collection in solrcloud not load the new config Key: SOLR-4292 URL: https://issues.apache.org/jira/browse/SOLR-4292 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: CentOS release 6.3 (Final) Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Yago Riveiro Rodríguez I'm trying to change the settings for a specific collection, which is empty, with a new config. The collection has 2 shards, and the zookeeper is a cluster of 3 servers. I used the zookeeper to upload the configuration and link it with the collection. After this, I reloaded the collection in both nodes (replica and leader) but when I try to see the STATUS of collection's core (/solr/admin/cores?action=STATUSwt=jsonindent=true) I get this error: ST-4A46DF1563_0812:org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper:statisticsBucket-aggregation-revision-1 The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the configname: {configName:statisticsBucket-aggregation-revision-1} If the zookeeper has the new config loaded and I linked the config to the collection, why the status of core says that the configuration is missing? /Yago -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update
Ben Pennell created SOLR-4294: - Summary: Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update Key: SOLR-4294 URL: https://issues.apache.org/jira/browse/SOLR-4294 Project: Solr Issue Type: Bug Components: clients - java, update Affects Versions: 4.0 Environment: RHEL Reporter: Ben Pennell Priority: Minor Fix For: 4.0.1, 4.1 Setting multiple values to a multivalued field via an XML atomic update request is resulting in what appears to be the output of a toString() method. See the examples below. I ran into this issue using the output for atomic updates from the fix for Solr-4133 to ClientUtils. The server being used is the base 4.0.0 release. {code} curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' -d ' adddoc boost=1.0 field name=idtest/field field name=status update=setone/field field name=status update=settwo/field /doc/add' {code} Yields the following in Solr: {code} arr name=statusstr{set=one}/strstr{set=two}/str/arr {code} Changing the second set to an add has the same effect. If I only set one value though, it works correctly: {code} adddoc boost=1.0 field name=idtest/field field name=status update=setone/field /doc/add {code} Yields: {code} arr name=statusstrone/str/arr {code} It also works fine if I split it into two operations {code} adddoc boost=1.0 field name=idtest/field field name=status update=setone/field /doc/add adddoc boost=1.0 field name=idtest/field field name=status update=addtwo/field /doc/add {code} Yields: {code} arr name=statusstrone/strstrtwo/str/arr {code} Oddly, it works fine as a singe request in JSON: {code} curl -k 'http://localhost/solr/update?commit=true' -H 'Content-type:application/json' -d '[id:test, {status:{set:[one, two]}}]' {code} Yields: {code} arr name=statusstrone/strstrtwo/str/arr {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549759#comment-13549759 ] Michael McCandless commented on LUCENE-4620: Thanks Shai, that new patch worked! This patch looks great! It's a little disturbing that every doc must make a new HashMapString,BytesRef at indexing time (seems like a lot of overhead/objects when the common case just needs to return a single BytesRef, which could be re-used). Can we use Collections.singletonMap when there are no partitions? The decode API (more important than encode) looks like it reuses the Bytes/IntsRef, so that's good. Hmm why do we have VInt8.bytesNeeded? Who uses that? I think that's a dangerous API to have it's better to simply encode and then see how many bytes it took. Hmm, it's a little abusive how VInt8.decode changes the offset of the incoming BytesRef ... I guess this is why you want an upto :) Net/net this is great progress over what we have today, so +1! I ran a quick 10M English Wikipedia test w/ just term queries: {noformat} TaskQPS base StdDevQPS comp StdDevPct diff HighTerm 12.79 (2.4%) 12.56 (1.2%) -1.8% ( -5% -1%) MedTerm 18.04 (1.8%) 17.77 (0.8%) -1.5% ( -4% -1%) LowTerm 47.69 (1.1%) 47.56 (1.0%) -0.3% ( -2% -1%) {noformat} The test only has 3 ords per doc so it's not typical ... looks like things got a bit slower (or possibly it's noise). Explore IntEncoder/Decoder bulk API --- Key: LUCENE-4620 URL: https://issues.apache.org/jira/browse/LUCENE-4620 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-4620.patch, LUCENE-4620.patch Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV. Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go. One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API. At the end of the day, the requirement is for someone to be able to configure how ordinals are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with as little overhead as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4295) SolrQuery setFacet*() and getFacet*() should have versions that specify the field
Colin Bartolome created SOLR-4295: - Summary: SolrQuery setFacet*() and getFacet*() should have versions that specify the field Key: SOLR-4295 URL: https://issues.apache.org/jira/browse/SOLR-4295 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 4.0 Reporter: Colin Bartolome Priority: Minor Since the parameter names for field-specific faceting parameters are a little odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery class should have methods that take a field parameter. The SolrQuery.setFacetPrefix() method already takes such a parameter. It would be great if the rest of the setFacet*() and getFacet*() methods did, too. The workaround is trivial, albeit clumsy: just create the parameter names by hand, as necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4295) SolrQuery setFacet*() and getFacet*() should have versions that specify the field
[ https://issues.apache.org/jira/browse/SOLR-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Bartolome updated SOLR-4295: -- Description: Since the parameter names for field-specific faceting parameters are a little odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery class should have methods that take a field parameter. The SolrQuery.setFacetPrefix() method already takes such a parameter. It would be great if the rest of the setFacet*() and getFacet*() methods did, too. The workaround is trivial, albeit clumsy: just create the parameter names by hand, as necessary. Also, as far as I can tell, there isn't a constant for the f. prefix. That would be helpful, too. was: Since the parameter names for field-specific faceting parameters are a little odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery class should have methods that take a field parameter. The SolrQuery.setFacetPrefix() method already takes such a parameter. It would be great if the rest of the setFacet*() and getFacet*() methods did, too. The workaround is trivial, albeit clumsy: just create the parameter names by hand, as necessary. SolrQuery setFacet*() and getFacet*() should have versions that specify the field - Key: SOLR-4295 URL: https://issues.apache.org/jira/browse/SOLR-4295 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 4.0 Reporter: Colin Bartolome Priority: Minor Since the parameter names for field-specific faceting parameters are a little odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery class should have methods that take a field parameter. The SolrQuery.setFacetPrefix() method already takes such a parameter. It would be great if the rest of the setFacet*() and getFacet*() methods did, too. The workaround is trivial, albeit clumsy: just create the parameter names by hand, as necessary. Also, as far as I can tell, there isn't a constant for the f. prefix. That would be helpful, too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549894#comment-13549894 ] Steve Rowe commented on LUCENE-4431: Can this be resolved now, since 3.6.2 was released? License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt - Key: LUCENE-4431 URL: https://issues.apache.org/jira/browse/LUCENE-4431 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 3.6.1, 4.0-BETA Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0, 4.1, 5.0, 3.6.3 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch - The demo module has sevlet-api.jar with a ASF-named license file and the text TODO: fill in - This also affects Solr: It has a full ASF license file, but that is wrong. The servlet-apoi file is CDDL license: http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549904#comment-13549904 ] Robert Muir commented on LUCENE-4431: - No, because it wasnt fixed in 3.6.2 License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt - Key: LUCENE-4431 URL: https://issues.apache.org/jira/browse/LUCENE-4431 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 3.6.1, 4.0-BETA Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0, 4.1, 5.0, 3.6.3 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch - The demo module has sevlet-api.jar with a ASF-named license file and the text TODO: fill in - This also affects Solr: It has a full ASF license file, but that is wrong. The servlet-apoi file is CDDL license: http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549914#comment-13549914 ] Steve Rowe commented on LUCENE-4431: ah right, fix version is 3.6.3 License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt - Key: LUCENE-4431 URL: https://issues.apache.org/jira/browse/LUCENE-4431 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 3.6.1, 4.0-BETA Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0, 4.1, 5.0, 3.6.3 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch - The demo module has sevlet-api.jar with a ASF-named license file and the text TODO: fill in - This also affects Solr: It has a full ASF license file, but that is wrong. The servlet-apoi file is CDDL license: http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt
[ https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549930#comment-13549930 ] Robert Muir commented on LUCENE-4431: - I did those automatically (when jira releases, it asks you if you want to move out any still-open issues... never saw it before, its handy though). but yeah we should still fix this if we do a 3.6.3 IMO License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt - Key: LUCENE-4431 URL: https://issues.apache.org/jira/browse/LUCENE-4431 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 3.6.1, 4.0-BETA Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Blocker Fix For: 4.0, 4.1, 5.0, 3.6.3 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch - The demo module has sevlet-api.jar with a ASF-named license file and the text TODO: fill in - This also affects Solr: It has a full ASF license file, but that is wrong. The servlet-apoi file is CDDL license: http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-4134: --- Fix Version/s: 4.1 modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
As of now, there are two Blocker issues in JIRA with Fix Version 4.1: Dataimporting with SolrCloud Fails https://issues.apache.org/jira/browse/SOLR-4112 modify release process/scripts to use svn for rc/release publishing (svnpubsub) https://issues.apache.org/jira/browse/LUCENE-4134 (LUCENE-4431 - servlet-api.jar licensing - is listed as Blocker with Fix Version including 4.1, but this has been fixed in branch_4x, and was reopened only for 3.6.X backporting.) LUCENE-4547 https://issues.apache.org/jira/browse/LUCENE-4547 (DocValues 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits to branches/lucene4547/ include changes to the Lucene41 codec. Looks like Fix Version should be changed to 4.1? I'd like to release soon. What else blocks this? Steve On Dec 31, 2012, at 2:08 PM, Mark Miller markrmil...@gmail.com wrote: I've started pushing on JIRA issue for a 4.1 release. If something is pushed that you are going to work on in the very near term, please put it back. I'll progressively get more aggressive about pushing and count on committers to fix any mistakes if they want something in 4.1. Remember, 4.2 can come shortly after 4.1. Next I will be pushing any 4.1 issues that have not been updated in a couple months. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4620: --- Attachment: LUCENE-4620.patch bq. Can we use Collections.singletonMap when there are no partitions? Done. Note though that BytesRef cannot be reused in the case of PerDimensionIndexingParams (i.e. multiple CLPs). This is not the common case, but it's not trivial to specialize it. Maybe as a second iteration. I did put a TODO in FacetFields to allow reuse. bq. why do we have VInt8.bytesNeeded? Who uses that? Currently no one uses it, but it was there and I thought that it's a convenient API to keep. Why encode and then see how many bytes were occupied? Anyway, neither the encoders nor the decoders use it. I have no strong feelings for keeping/removing it, so if you feel like it should be removed, I can do it. bq. Hmm, it's a little abusive how VInt8.decode changes the offset of the incoming BytesRef It is, but that's the result of Java's lack of pass by reference. I.e., decode needs to return the caller two values: the decoded number and how many bytes were read. Notice that in the previous byte[] variant, the method took a class Position, which is horrible. That's why I documented in decode() that it advances bytes.offset, so the caller can restore it in the end. For instance, IntDecoder restores the offset to the original one in the end. On LUCENE-4675 Robert gave me an idea to create a BytesRefIterator, and I started to play with it. I.e. it would wrap a BytesRef but add 'pos' and 'upto' indexes. The user can modify 'pos' freely, withouth touching bytes.offset. That introduces an object allocation though, and since I'd want to reuse that object wherever possible, I think I'll look at it after finishing this issue. It already contains too many changes. bq. I guess this is why you want an upto No, I wanted upto because iterating up to bytes.length is incorrect. You need to iterate up to offset+length. BytesRefIterator.pos and BytesRefIterator.upto solve these cases for me. bq. looks like things got a bit slower (or possibly it's noise) First, even if it's not noise, the slowdown IMO is worth the code simplification. But, I do believe that we'll see gains when there are more than 3 integers to encode/decode. In fact, the facets test package has an EncodingSpeed class which measures the time it takes to encode/decode a large number of integers (a few thousands). When I compared the result to 4x (i.e. without the patch), the decode time seemed to be ~x5 faster. In this patch I added an Ant task run-encoding-benchmark which runs this class. Want to give it a try on your beast machine? For 4x, you can just copy the target to lucene/facet/build.xml, I believe it will work without issues. Explore IntEncoder/Decoder bulk API --- Key: LUCENE-4620 URL: https://issues.apache.org/jira/browse/LUCENE-4620 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV. Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go. One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API. At the end of the day, the requirement is for someone to be able to configure how ordinals are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with as little overhead as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results
[ https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-4286: --- Priority: Blocker (was: Major) Atomic Updates on multi-valued fields giving unexpected results --- Key: SOLR-4286 URL: https://issues.apache.org/jira/browse/SOLR-4286 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: Windows 7 64-bit Reporter: Abhinav Shah Assignee: Shalin Shekhar Mangar Priority: Blocker I am using apache-solr 4.0. I am trying to post the following document - {code} curl http://irvis016:8983/solr/collection1/update?commit=true -H Content-Type: text/xml --data-binary 'add commitWithin=5000doc boost=1.0field name=accessionNumber update=set3165297/fieldfield name=status update=setORDERED/fieldfield name=account.accountName update=setUS LABS DEMO ACCOUNT/fieldfield name=account.addresses.address1 update=set2601 Campus Drive/fieldfield name=account.addresses.city update=setIrvine/fieldfield name=account.addresses.state update=setCA/fieldfield name=account.addresses.zip update=set92622/fieldfield name=account.externalIds.sourceSystem update=set10442/fieldfield name=orderingPhysician.lcProviderNumber update=set60086/fieldfield name=patient.lpid update=set5571351625769103/fieldfield name=patient.patientName.lastName update=settest/fieldfield name=patient.patientName.firstName update=settest123/fieldfield name=patient.patientSSN update=set643522342/fieldfield name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield name=patient.mrNs.mrn update=set5423/fieldfield name=specimens.specimenType update=setBone Marrow/fieldfield name=specimens.specimenType update=setNerve tissue/fieldfield name=UID3165297USLABS2012/field/doc/add' {code} This document gets successfully posted. However, the multi-valued field 'specimens.specimenType', gets stored as following in SOLR - {code} arr name=specimens.specimenType str{set=Bone Marrow}/str str{set=Nerve tissue}/str /arr {code} I did not expect {set= to be stored along with the text Bone Marror. My Solr schema xml definition for the field specimens.SpecimenType is - {code} field indexed=true multiValued=true name=specimens.specimenType omitNorms=false omitPositions=true omitTermFreqAndPositions=true stored=true termVectors=false type=text_en/ {code} Can someone help? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update
[ https://issues.apache.org/jira/browse/SOLR-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-4294: --- Priority: Blocker (was: Minor) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update --- Key: SOLR-4294 URL: https://issues.apache.org/jira/browse/SOLR-4294 Project: Solr Issue Type: Bug Components: clients - java, update Affects Versions: 4.0 Environment: RHEL Reporter: Ben Pennell Priority: Blocker Fix For: 4.0.1, 4.1 Setting multiple values to a multivalued field via an XML atomic update request is resulting in what appears to be the output of a toString() method. See the examples below. I ran into this issue using the output for atomic updates from the fix for Solr-4133 to ClientUtils. The server being used is the base 4.0.0 release. {code} curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' -d ' adddoc boost=1.0 field name=idtest/field field name=status update=setone/field field name=status update=settwo/field /doc/add' {code} Yields the following in Solr: {code} arr name=statusstr{set=one}/strstr{set=two}/str/arr {code} Changing the second set to an add has the same effect. If I only set one value though, it works correctly: {code} adddoc boost=1.0 field name=idtest/field field name=status update=setone/field /doc/add {code} Yields: {code} arr name=statusstrone/str/arr {code} It also works fine if I split it into two operations {code} adddoc boost=1.0 field name=idtest/field field name=status update=setone/field /doc/add adddoc boost=1.0 field name=idtest/field field name=status update=addtwo/field /doc/add {code} Yields: {code} arr name=statusstrone/strstrtwo/str/arr {code} Oddly, it works fine as a singe request in JSON: {code} curl -k 'http://localhost/solr/update?commit=true' -H 'Content-type:application/json' -d '[id:test, {status:{set:[one, two]}}]' {code} Yields: {code} arr name=statusstrone/strstrtwo/str/arr {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update
[ https://issues.apache.org/jira/browse/SOLR-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-4294: --- Assignee: Shalin Shekhar Mangar Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update --- Key: SOLR-4294 URL: https://issues.apache.org/jira/browse/SOLR-4294 Project: Solr Issue Type: Bug Components: clients - java, update Affects Versions: 4.0 Environment: RHEL Reporter: Ben Pennell Assignee: Shalin Shekhar Mangar Priority: Blocker Fix For: 4.0.1, 4.1 Setting multiple values to a multivalued field via an XML atomic update request is resulting in what appears to be the output of a toString() method. See the examples below. I ran into this issue using the output for atomic updates from the fix for Solr-4133 to ClientUtils. The server being used is the base 4.0.0 release. {code} curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' -d ' adddoc boost=1.0 field name=idtest/field field name=status update=setone/field field name=status update=settwo/field /doc/add' {code} Yields the following in Solr: {code} arr name=statusstr{set=one}/strstr{set=two}/str/arr {code} Changing the second set to an add has the same effect. If I only set one value though, it works correctly: {code} adddoc boost=1.0 field name=idtest/field field name=status update=setone/field /doc/add {code} Yields: {code} arr name=statusstrone/str/arr {code} It also works fine if I split it into two operations {code} adddoc boost=1.0 field name=idtest/field field name=status update=setone/field /doc/add adddoc boost=1.0 field name=idtest/field field name=status update=addtwo/field /doc/add {code} Yields: {code} arr name=statusstrone/strstrtwo/str/arr {code} Oddly, it works fine as a singe request in JSON: {code} curl -k 'http://localhost/solr/update?commit=true' -H 'Content-type:application/json' -d '[id:test, {status:{set:[one, two]}}]' {code} Yields: {code} arr name=statusstrone/strstrtwo/str/arr {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
I set a couple of others to Blocker just now, which are related, probably dups. Shalin is assigned to them both. Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update https://issues.apache.org/jira/browse/SOLR-4294 and Atomic Updates on multi-valued fields giving unexpected results https://issues.apache.org/jira/browse/SOLR-4286 Hopefully these aren't too bad and can make it in as well. Erik On Jan 10, 2013, at 14:12 , Steve Rowe wrote: As of now, there are two Blocker issues in JIRA with Fix Version 4.1: Dataimporting with SolrCloud Fails https://issues.apache.org/jira/browse/SOLR-4112 modify release process/scripts to use svn for rc/release publishing (svnpubsub) https://issues.apache.org/jira/browse/LUCENE-4134 (LUCENE-4431 - servlet-api.jar licensing - is listed as Blocker with Fix Version including 4.1, but this has been fixed in branch_4x, and was reopened only for 3.6.X backporting.) LUCENE-4547 https://issues.apache.org/jira/browse/LUCENE-4547 (DocValues 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits to branches/lucene4547/ include changes to the Lucene41 codec. Looks like Fix Version should be changed to 4.1? I'd like to release soon. What else blocks this? Steve On Dec 31, 2012, at 2:08 PM, Mark Miller markrmil...@gmail.com wrote: I've started pushing on JIRA issue for a 4.1 release. If something is pushed that you are going to work on in the very near term, please put it back. I'll progressively get more aggressive about pushing and count on committers to fix any mistakes if they want something in 4.1. Remember, 4.2 can come shortly after 4.1. Next I will be pushing any 4.1 issues that have not been updated in a couple months. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results
[ https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549957#comment-13549957 ] Yonik Seeley commented on SOLR-4286: Hopefully this is already fixed. Can you try a recent nightly build of 4x (soon to become 4.1)? http://wiki.apache.org/solr/NightlyBuilds Atomic Updates on multi-valued fields giving unexpected results --- Key: SOLR-4286 URL: https://issues.apache.org/jira/browse/SOLR-4286 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: Windows 7 64-bit Reporter: Abhinav Shah Assignee: Shalin Shekhar Mangar Priority: Blocker I am using apache-solr 4.0. I am trying to post the following document - {code} curl http://irvis016:8983/solr/collection1/update?commit=true -H Content-Type: text/xml --data-binary 'add commitWithin=5000doc boost=1.0field name=accessionNumber update=set3165297/fieldfield name=status update=setORDERED/fieldfield name=account.accountName update=setUS LABS DEMO ACCOUNT/fieldfield name=account.addresses.address1 update=set2601 Campus Drive/fieldfield name=account.addresses.city update=setIrvine/fieldfield name=account.addresses.state update=setCA/fieldfield name=account.addresses.zip update=set92622/fieldfield name=account.externalIds.sourceSystem update=set10442/fieldfield name=orderingPhysician.lcProviderNumber update=set60086/fieldfield name=patient.lpid update=set5571351625769103/fieldfield name=patient.patientName.lastName update=settest/fieldfield name=patient.patientName.firstName update=settest123/fieldfield name=patient.patientSSN update=set643522342/fieldfield name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield name=patient.mrNs.mrn update=set5423/fieldfield name=specimens.specimenType update=setBone Marrow/fieldfield name=specimens.specimenType update=setNerve tissue/fieldfield name=UID3165297USLABS2012/field/doc/add' {code} This document gets successfully posted. However, the multi-valued field 'specimens.specimenType', gets stored as following in SOLR - {code} arr name=specimens.specimenType str{set=Bone Marrow}/str str{set=Nerve tissue}/str /arr {code} I did not expect {set= to be stored along with the text Bone Marror. My Solr schema xml definition for the field specimens.SpecimenType is - {code} field indexed=true multiValued=true name=specimens.specimenType omitNorms=false omitPositions=true omitTermFreqAndPositions=true stored=true termVectors=false type=text_en/ {code} Can someone help? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
On Thu, Jan 10, 2013 at 11:12 AM, Steve Rowe sar...@gmail.com wrote: LUCENE-4547 https://issues.apache.org/jira/browse/LUCENE-4547 (DocValues 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits to branches/lucene4547/ include changes to the Lucene41 codec. Looks like Fix Version should be changed to 4.1? This is a pretty bad bug (you cannot use docvalues with large segments: I initially made it blocker for that reason), but I think we are making good progress at a good pace. My personal opinion: Its fine to just move it out to 4.2, I'd rather have the time to get everything nice. A 4.1 would be an improvement on its own, even if there are known problems like that. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote: I'd like to release soon. What else blocks this? I think we should toss out a short term date (next tuesday?) for anyone to get in what they need for 4.1. Then just consider blockers after branching? Then release? Objections, better ideas? I think we should give a bit of time for people to finish up what's in flight or fix any blockers. Then we should heighten testing and allow for any new blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549961#comment-13549961 ] Michael McCandless commented on LUCENE-4620: {quote} bq. Can we use Collections.singletonMap when there are no partitions? Done. Note though that BytesRef cannot be reused in the case of PerDimensionIndexingParams (i.e. multiple CLPs). This is not the common case, but it's not trivial to specialize it. Maybe as a second iteration. I did put a TODO in FacetFields to allow reuse. {quote} Well, we'd somehow need N BytesRefs to reuse (one per CLP) ... but I don't think we should worry about that now. It is unfortunate that the common case is often held back by the full flexibility/generality of the facet module ... sometimes I think we need a facet-light module. But maybe if we can get the specialization done we don't need facet-light ... {quote} bq. why do we have VInt8.bytesNeeded? Who uses that? Currently no one uses it, but it was there and I thought that it's a convenient API to keep. Why encode and then see how many bytes were occupied? Anyway, neither the encoders nor the decoders use it. I have no strong feelings for keeping/removing it, so if you feel like it should be removed, I can do it. {quote} I think we should remove it: it's a dangerous API because it can encourage consumers to do things like call bytesNeeded first (to know how much to grow their buffer, say) followed by encoding. The slow part of vInt encoding is all those ifs ... {quote} bq. Hmm, it's a little abusive how VInt8.decode changes the offset of the incoming BytesRef It is, but that's the result of Java's lack of pass by reference. I.e., decode needs to return the caller two values: the decoded number and how many bytes were read. Notice that in the previous byte[] variant, the method took a class Position, which is horrible. That's why I documented in decode() that it advances bytes.offset, so the caller can restore it in the end. For instance, IntDecoder restores the offset to the original one in the end. On LUCENE-4675 Robert gave me an idea to create a BytesRefIterator, and I started to play with it. I.e. it would wrap a BytesRef but add 'pos' and 'upto' indexes. The user can modify 'pos' freely, withouth touching bytes.offset. That introduces an object allocation though, and since I'd want to reuse that object wherever possible, I think I'll look at it after finishing this issue. It already contains too many changes. {quote} OK. {quote} bq. I guess this is why you want an upto No, I wanted upto because iterating up to bytes.length is incorrect. You need to iterate up to offset+length. BytesRefIterator.pos and BytesRefIterator.upto solve these cases for me. {quote} OK. {quote} bq. looks like things got a bit slower (or possibly it's noise) First, even if it's not noise, the slowdown IMO is worth the code simplification. {quote} +1 {quote} But, I do believe that we'll see gains when there are more than 3 integers to encode/decode. In fact, the facets test package has an EncodingSpeed class which measures the time it takes to encode/decode a large number of integers (a few thousands). When I compared the result to 4x (i.e. without the patch), the decode time seemed to be ~x5 faster. {quote} Good! Would be nice to have a real-world biggish-number-of-facets benchmark ... I'll ponder how to do that w/ luceneutil. bq. In this patch I added an Ant task run-encoding-benchmark which runs this class. Want to give it a try on your beast machine? For 4x, you can just copy the target to lucene/facet/build.xml, I believe it will work without issues. OK I'll run it! Explore IntEncoder/Decoder bulk API --- Key: LUCENE-4620 URL: https://issues.apache.org/jira/browse/LUCENE-4620 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int). Originally, we believed that this layer can be useful for other scenarios, but in practice it's used only for writing/reading the category ordinals from payload/DV. Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we don't know in advance how many ints will be written), dunno. Will figure this out as we go. One thing to check is whether the bulk API can work w/ e.g. facet associations, which can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll figure out as we go. I don't rule out that associations will use a different bulk API.
Re: 4.1 release
I'd like to start sooner than next Tuesday. I propose to make the branch tomorrow, and only allow Blocker issues to hold up the release after that. A release candidate should then be possible by the middle of next week. Steve On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote: I'd like to release soon. What else blocks this? I think we should toss out a short term date (next tuesday?) for anyone to get in what they need for 4.1. Then just consider blockers after branching? Then release? Objections, better ideas? I think we should give a bit of time for people to finish up what's in flight or fix any blockers. Then we should heighten testing and allow for any new blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #734: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/734/ 1 tests failed. FAILED: org.apache.solr.cloud.SyncSliceTest.testDistribSearch Error Message: shard1 should have just been set up to be inconsistent - but it's still consistent Stack Trace: java.lang.AssertionError: shard1 should have just been set up to be inconsistent - but it's still consistent at __randomizedtesting.SeedInfo.seed([5A32B9FE8374BE51:DBD437E6F42BDE6D]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549975#comment-13549975 ] Michael McCandless commented on LUCENE-4620: Trunk: {noformat} [java] Estimating ~1 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, length of: 2430) 41152 times. [java] [java] EncoderBits/Int Encode Time Encode Time Decode TimeDecode Time [java] [milliseconds] [microsecond / int] [milliseconds][microsecond / int] [java] --- [java] VInt8 18.4955 4430 44.3003 116211.6201 [java] Sorting (Unique (VInt8))18.4955 4344 43.4403 110511.0501 [java] Sorting (Unique (DGap (VInt8))) 8.5597 4481 44.8103 842 8.4201 [java] Sorting (Unique (DGap (EightFlags (VInt8 4.9679 463646.3603 1021 10.2101 [java] Sorting (Unique (DGap (FourFlags (VInt8 4.8198 451545.1503 1001 10.0101 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8) 4.5794 490449.0403 1056 10.5601 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8) 4.5794 475147.5103 1035 10.3501 [java] [java] [java] Estimating ~1 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, length of: 1489) 67159 times. [java] [java] EncoderBits/Int Encode Time Encode Time Decode TimeDecode Time [java] [milliseconds] [microsecond / int] [milliseconds][microsecond / int] [java] --- [java] VInt8 18.2673 1241 12.4100 112811.2800 [java] Sorting (Unique (VInt8))18.2673 3488 34.8801 924 9.2400 [java] Sorting (Unique (DGap (VInt8))) 8.9456 3061 30.6101 660 6.6000 [java] Sorting (Unique (DGap (EightFlags (VInt8 5.7542 369336.9301 1026 10.2600 [java] Sorting (Unique (DGap (FourFlags (VInt8 5.5447 346234.6201 811 8.1100 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8) 5.3566 384638.4601 1018 10.1800 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8) 5.3996 387938.7901 1025 10.2500 [java] [java] [java] Estimating ~1 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 1 (unsorted, length of: 18) 555 times. [java] [java] EncoderBits/Int Encode Time Encode Time Decode TimeDecode Time [java] [milliseconds] [microsecond / int] [milliseconds][microsecond / int] [java] --- [java] VInt8 20.8889 1179 11.7900 111411.1400 [java] Sorting (Unique (VInt8))20.8889 2251 22.5100 117111.7100 [java] Sorting (Unique (DGap (VInt8)))12. 2174 21.7400 848 8.4800 [java] Sorting (Unique (DGap (EightFlags (VInt810. 237223.7200 1092
[jira] [Created] (LUCENE-4677) Use vInt to encode node addresses inside FST
Michael McCandless created LUCENE-4677: -- Summary: Use vInt to encode node addresses inside FST Key: LUCENE-4677 URL: https://issues.apache.org/jira/browse/LUCENE-4677 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.2, 5.0 Today we use int, but towards enabling 2.1G sized FSTs, I'd like to make this vInt instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4677) Use vInt to encode node addresses inside FST
[ https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-4677: -- Assignee: Michael McCandless Use vInt to encode node addresses inside FST Key: LUCENE-4677 URL: https://issues.apache.org/jira/browse/LUCENE-4677 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Today we use int, but towards enabling 2.1G sized FSTs, I'd like to make this vInt instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results
[ https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550245#comment-13550245 ] Abhinav Shah commented on SOLR-4286: I tried on nightly build - apache-solr-4.1-2013-01-10_05-50-28.zip, and it works. Thanks Atomic Updates on multi-valued fields giving unexpected results --- Key: SOLR-4286 URL: https://issues.apache.org/jira/browse/SOLR-4286 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: Windows 7 64-bit Reporter: Abhinav Shah Assignee: Shalin Shekhar Mangar Priority: Blocker I am using apache-solr 4.0. I am trying to post the following document - {code} curl http://irvis016:8983/solr/collection1/update?commit=true -H Content-Type: text/xml --data-binary 'add commitWithin=5000doc boost=1.0field name=accessionNumber update=set3165297/fieldfield name=status update=setORDERED/fieldfield name=account.accountName update=setUS LABS DEMO ACCOUNT/fieldfield name=account.addresses.address1 update=set2601 Campus Drive/fieldfield name=account.addresses.city update=setIrvine/fieldfield name=account.addresses.state update=setCA/fieldfield name=account.addresses.zip update=set92622/fieldfield name=account.externalIds.sourceSystem update=set10442/fieldfield name=orderingPhysician.lcProviderNumber update=set60086/fieldfield name=patient.lpid update=set5571351625769103/fieldfield name=patient.patientName.lastName update=settest/fieldfield name=patient.patientName.firstName update=settest123/fieldfield name=patient.patientSSN update=set643522342/fieldfield name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield name=patient.mrNs.mrn update=set5423/fieldfield name=specimens.specimenType update=setBone Marrow/fieldfield name=specimens.specimenType update=setNerve tissue/fieldfield name=UID3165297USLABS2012/field/doc/add' {code} This document gets successfully posted. However, the multi-valued field 'specimens.specimenType', gets stored as following in SOLR - {code} arr name=specimens.specimenType str{set=Bone Marrow}/str str{set=Nerve tissue}/str /arr {code} I did not expect {set= to be stored along with the text Bone Marror. My Solr schema xml definition for the field specimens.SpecimenType is - {code} field indexed=true multiValued=true name=specimens.specimenType omitNorms=false omitPositions=true omitTermFreqAndPositions=true stored=true termVectors=false type=text_en/ {code} Can someone help? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4677) Use vInt to encode node addresses inside FST
[ https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4677: --- Attachment: LUCENE-4677.patch Initial patch ... not committable until I add a back-compat layer somehow ... (how come TestBackCompat isn't failing...). I tested Kuromoji's TokenInfo FST, temporarily turning off packing: vInt encoding made the non-packed FST ~12% smaller (good!). The packed FST is unchanged in size. Then I tested on a bigger FST (AnalyzingSuggester build of FreeDB's song titles) and the resulting FST is nearly the same size (1.0463 GB for trunk and 1.0458 with patch). Use vInt to encode node addresses inside FST Key: LUCENE-4677 URL: https://issues.apache.org/jira/browse/LUCENE-4677 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4677.patch Today we use int, but towards enabling 2.1G sized FSTs, I'd like to make this vInt instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results
[ https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-4286. Resolution: Duplicate Assignee: (was: Shalin Shekhar Mangar) Atomic Updates on multi-valued fields giving unexpected results --- Key: SOLR-4286 URL: https://issues.apache.org/jira/browse/SOLR-4286 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: Windows 7 64-bit Reporter: Abhinav Shah Priority: Blocker I am using apache-solr 4.0. I am trying to post the following document - {code} curl http://irvis016:8983/solr/collection1/update?commit=true -H Content-Type: text/xml --data-binary 'add commitWithin=5000doc boost=1.0field name=accessionNumber update=set3165297/fieldfield name=status update=setORDERED/fieldfield name=account.accountName update=setUS LABS DEMO ACCOUNT/fieldfield name=account.addresses.address1 update=set2601 Campus Drive/fieldfield name=account.addresses.city update=setIrvine/fieldfield name=account.addresses.state update=setCA/fieldfield name=account.addresses.zip update=set92622/fieldfield name=account.externalIds.sourceSystem update=set10442/fieldfield name=orderingPhysician.lcProviderNumber update=set60086/fieldfield name=patient.lpid update=set5571351625769103/fieldfield name=patient.patientName.lastName update=settest/fieldfield name=patient.patientName.firstName update=settest123/fieldfield name=patient.patientSSN update=set643522342/fieldfield name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield name=patient.mrNs.mrn update=set5423/fieldfield name=specimens.specimenType update=setBone Marrow/fieldfield name=specimens.specimenType update=setNerve tissue/fieldfield name=UID3165297USLABS2012/field/doc/add' {code} This document gets successfully posted. However, the multi-valued field 'specimens.specimenType', gets stored as following in SOLR - {code} arr name=specimens.specimenType str{set=Bone Marrow}/str str{set=Nerve tissue}/str /arr {code} I did not expect {set= to be stored along with the text Bone Marror. My Solr schema xml definition for the field specimens.SpecimenType is - {code} field indexed=true multiValued=true name=specimens.specimenType omitNorms=false omitPositions=true omitTermFreqAndPositions=true stored=true termVectors=false type=text_en/ {code} Can someone help? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550352#comment-13550352 ] Steve Rowe commented on LUCENE-4134: bq. [A]s part of this new process there will also be a https://dist.apache.org/repos/dist/dev/lucene; directory where release candidates can be put for review (instead of people.apache.org/~releasemanager/...), and if/when they are voted successfully a simple svn mv to dist/release/lucene makes them official and pushes them to the mirrors. There is a wrinkle here: maven artifacts. Our current process includes them with the ASF release artifacts at the RC review download link. If we continue this when we instead commit RCs to {{repos/dist/dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/}}, then the release publishing process can't be just {{svn mv dev/lucene/{java,solr}/X.Y.ZRCN-rMMM release/lucene/{java,solr}/X.Y.Z}}. Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/maven}}. An alternative: now that we stage maven artifacts to Nexus (repository.apache.org) prior to the release, we could as part of an RC announcement also include the Nexus link. This option gets my +1. modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
-1 from me - I don't like not giving people a target date to clean things up by. No one has given a proposed date to try and tie things up by - just calling 'hike is tomorrow' out of nowhere doesn't seem right to me. We have a lot of people working on this over a lot of timezones. I think we should do the right thing and give everyone at least a few days and a weekend to finish getting their issues into 4.1. - Mark On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote: I'd like to start sooner than next Tuesday. I propose to make the branch tomorrow, and only allow Blocker issues to hold up the release after that. A release candidate should then be possible by the middle of next week. Steve On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote: I'd like to release soon. What else blocks this? I think we should toss out a short term date (next tuesday?) for anyone to get in what they need for 4.1. Then just consider blockers after branching? Then release? Objections, better ideas? I think we should give a bit of time for people to finish up what's in flight or fix any blockers. Then we should heighten testing and allow for any new blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550352#comment-13550352 ] Steve Rowe edited comment on LUCENE-4134 at 1/10/13 8:09 PM: - bq. [A]s part of this new process there will also be a https://dist.apache.org/repos/dist/dev/lucene; directory where release candidates can be put for review (instead of people.apache.org/~releasemanager/...), and if/when they are voted successfully a simple svn mv to dist/release/lucene makes them official and pushes them to the mirrors. There is a wrinkle here: maven artifacts. Our current process includes them with the ASF release artifacts at the RC review download link. If we continue this when we instead commit RCs to {{repos/dist/dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM/}}, then the release publishing process can't be just {{svn mv dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM release/lucene/\{java,solr}/X.Y.Z}}. Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM/maven}}. An alternative: now that we stage maven artifacts to Nexus (repository.apache.org) prior to the release, we could as part of an RC announcement also include the Nexus link, and not include the maven artifacts in {{repos/dist/dev/lucene/}}. This option gets my +1. was (Author: steve_rowe): bq. [A]s part of this new process there will also be a https://dist.apache.org/repos/dist/dev/lucene; directory where release candidates can be put for review (instead of people.apache.org/~releasemanager/...), and if/when they are voted successfully a simple svn mv to dist/release/lucene makes them official and pushes them to the mirrors. There is a wrinkle here: maven artifacts. Our current process includes them with the ASF release artifacts at the RC review download link. If we continue this when we instead commit RCs to {{repos/dist/dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/}}, then the release publishing process can't be just {{svn mv dev/lucene/{java,solr}/X.Y.ZRCN-rMMM release/lucene/{java,solr}/X.Y.Z}}. Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/maven}}. An alternative: now that we stage maven artifacts to Nexus (repository.apache.org) prior to the release, we could as part of an RC announcement also include the Nexus link. This option gets my +1. modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
Michael McCandless created LUCENE-4678: -- Summary: FST should use paged byte[] instead of single contiguous byte[] Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4678: --- Attachment: LUCENE-4678.patch Patch, I think it's close to ready (no format change for the FST so no back compat). FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4678: --- Attachment: LUCENE-4678.patch Duh, wrong patch ... this one should be right. FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3298) FST has hard limit max size of 2.1 GB
[ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3298: -- Assignee: Michael McCandless FST has hard limit max size of 2.1 GB - Key: LUCENE-3298 URL: https://issues.apache.org/jira/browse/LUCENE-3298 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3298.patch The FST uses a single contiguous byte[] under the hood, which in java is indexed by int so we cannot grow this over Integer.MAX_VALUE. It also internally encodes references to this array as vInt. We could switch this to a paged byte[] and make the far larger. But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3298) FST has hard limit max size of 2.1 GB
[ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3298: --- Attachment: LUCENE-3298.patch Initial test to confirm FSTs can grow beyond 2GB (it fails today!). FST has hard limit max size of 2.1 GB - Key: LUCENE-3298 URL: https://issues.apache.org/jira/browse/LUCENE-3298 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3298.patch, LUCENE-3298.patch The FST uses a single contiguous byte[] under the hood, which in java is indexed by int so we cannot grow this over Integer.MAX_VALUE. It also internally encodes references to this array as vInt. We could switch this to a paged byte[] and make the far larger. But I think this is low priority... I'm not going to work on it any time soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550373#comment-13550373 ] Robert Muir commented on LUCENE-4134: - Wouldn't another alternative instead just continue to use our p.a.o/~ versus deploying to two places? I don't like having to check a release spread across two different places. And this would also make automatic verification difficult (today, we can pass the p.a.o link and it checks everything) modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
Okay - I can see your logic, Mark, but this is not even close to out of nowhere. You yourself have been vocal about making a 4.1 release for a couple weeks now. I agree with Robert Muir that we should be promoting short turnaround releases. If it doesn't make this release, it'll make the next one, which will come out in a relatively short span of time. In this model, Blocker issues are the drivers, not Fix Version.If people want stuff in the release, they should mark their issue as Blocker. How about a compromise - next Monday we branch and only allow Blockers to block the release? Steve On Jan 10, 2013, at 3:08 PM, Mark Miller markrmil...@gmail.com wrote: -1 from me - I don't like not giving people a target date to clean things up by. No one has given a proposed date to try and tie things up by - just calling 'hike is tomorrow' out of nowhere doesn't seem right to me. We have a lot of people working on this over a lot of timezones. I think we should do the right thing and give everyone at least a few days and a weekend to finish getting their issues into 4.1. - Mark On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote: I'd like to start sooner than next Tuesday. I propose to make the branch tomorrow, and only allow Blocker issues to hold up the release after that. A release candidate should then be possible by the middle of next week. Steve On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote: I'd like to release soon. What else blocks this? I think we should toss out a short term date (next tuesday?) for anyone to get in what they need for 4.1. Then just consider blockers after branching? Then release? Objections, better ideas? I think we should give a bit of time for people to finish up what's in flight or fix any blockers. Then we should heighten testing and allow for any new blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4547) DocValues field broken on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4547: Priority: Major (was: Blocker) DocValues field broken on large indexes --- Key: LUCENE-4547 URL: https://issues.apache.org/jira/browse/LUCENE-4547 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.2, 5.0 Attachments: test.patch I tried to write a test to sanity check LUCENE-4536 (first running against svn revision 1406416, before the change). But i found docvalues is already broken here for large indexes that have a PackedLongDocValues field: {code} final int numDocs = 5; for (int i = 0; i numDocs; ++i) { if (i == 0) { field.setLongValue(0L); // force 32bit deltas } else { field.setLongValue(133L); } w.addDocument(doc); } w.forceMerge(1); w.close(); dir.close(); // checkindex {code} {noformat} [junit4:junit4] 2 WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #0,6,TGRP-Test2GBDocValues] [junit4:junit4] 2 org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException: -65536 [junit4:junit4] 2 at __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0) [junit4:junit4] 2 at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535) [junit4:junit4] 2 at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508) [junit4:junit4] 2 Caused by: java.lang.ArrayIndexOutOfBoundsException: -65536 [junit4:junit4] 2 at org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305) [junit4:junit4] 2 at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115) [junit4:junit4] 2 at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109) [junit4:junit4] 2 at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80) [junit4:junit4] 2 at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130) [junit4:junit4] 2 at org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550383#comment-13550383 ] Hoss Man commented on LUCENE-4134: -- bq. Wouldn't another alternative instead just continue to use our p.a.o/~ versus deploying to two places? +1 I would suggest that for now we move forward with the simplest possible changes to our overall processes that satisfies infra: using the new svn repo for our final release dist, but leave everything else related to RCs, and smoke checking, as is. Then we can discuss/iterate on other changes to the release process at our leasiure (ie: maybe we put hte RCs in svn, and tweak the directory structure so a simple svn mv works for the dist files, and we have some other script for the maven files) modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
Saying tomorrow without any date that gives anyone any time to do anything is out of nowhere to me. People in Europe and east of that will wake up and find out, oh today. While pressure has been building towards a release, no one has proposed a date for a cutoff. I think that is always only fair. I think that if you were desperate to cut off to blockers tomorrow, you should have called for that last week. Robert Muir's short term releases are not threatened by allowing people to plan and execute a release together. You can take that too far and do damage from the opposite direction. Giving people time to tie things up with a real deadline is only fair. We all know a nebulous deadline is not conducive to finishing up work. I think all releases should have a known date that we agree on that gives developers some time to finish what they are working on or what they believe is important for the release. At a minimum there should be a few days for this. A weekend involved only seems fair. This doesn't have to be a long time, but it should not require we file blockers and just seems like a friendly way to develop together. Monday is fine by me if others buy into it. Otherwise, we have taken 4 or 5 months for 4.1. Let's not drag it out another month. But let's not do the reverse and release it tonight. The sensible approach always seems like we should plan out some target dates on the list - dates that actually give devs a chance to respond to - and then follow through on those dates. - Mark On Jan 10, 2013, at 3:26 PM, Steve Rowe sar...@gmail.com wrote: Okay - I can see your logic, Mark, but this is not even close to out of nowhere. You yourself have been vocal about making a 4.1 release for a couple weeks now. I agree with Robert Muir that we should be promoting short turnaround releases. If it doesn't make this release, it'll make the next one, which will come out in a relatively short span of time. In this model, Blocker issues are the drivers, not Fix Version.If people want stuff in the release, they should mark their issue as Blocker. How about a compromise - next Monday we branch and only allow Blockers to block the release? Steve On Jan 10, 2013, at 3:08 PM, Mark Miller markrmil...@gmail.com wrote: -1 from me - I don't like not giving people a target date to clean things up by. No one has given a proposed date to try and tie things up by - just calling 'hike is tomorrow' out of nowhere doesn't seem right to me. We have a lot of people working on this over a lot of timezones. I think we should do the right thing and give everyone at least a few days and a weekend to finish getting their issues into 4.1. - Mark On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote: I'd like to start sooner than next Tuesday. I propose to make the branch tomorrow, and only allow Blocker issues to hold up the release after that. A release candidate should then be possible by the middle of next week. Steve On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote: I'd like to release soon. What else blocks this? I think we should toss out a short term date (next tuesday?) for anyone to get in what they need for 4.1. Then just consider blockers after branching? Then release? Objections, better ideas? I think we should give a bit of time for people to finish up what's in flight or fix any blockers. Then we should heighten testing and allow for any new blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550399#comment-13550399 ] Steve Rowe commented on LUCENE-4134: bq. Wouldn't another alternative instead just continue to use our p.a.o/~ versus deploying to two places? Yes, you're right: +1 bq. Then we can discuss/iterate on other changes to the release process at our leasiure (ie: maybe we put hte RCs in svn, and tweak the directory structure so a simple svn mv works for the dist files, and we have some other script for the maven files) If the {{maven/}} directories weren't there, a simple svn mv would work - no other tweaking required. What other script did you have in mind for the maven files? Are you talking about the need to change the smoke tester if the maven artifacts are moved out of the RC? modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550406#comment-13550406 ] Robert Muir commented on LUCENE-4134: - personally i would prefer if we don't have a separate script for changing the maven files. I'm not really sure what this tester is currently doing: but in my opinion if someone gets Lucene 4.1 i should know WTF they got, regardless of whether its from an FTP site or maven. So if it doesnt exist now, at least in the future I'd like more logic cross-checking between the two things to ensure they are consistent with each other. Its scary to me that different build systems are producing different artifacts and we don't have this today. And i know the checking isn't good enough when i see basic shit like things not even named the same way: SOLR-4287 modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3982) Admin UI: Various Dataimport Improvements
[ https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3982: Attachment: SOLR-3982.patch Updated Patch incorporates SOLR-4151 (normally i tried to handle issues separately, but this time it's easier to combine them) Additionally changed: * Show Info-Area also for 'idle' status * Make Auto-Refresh optional via Checkbox * Requests are now JSON and no longer XML _(Excluding the Configuration which is only available in XML)_ Admin UI: Various Dataimport Improvements - Key: SOLR-3982 URL: https://issues.apache.org/jira/browse/SOLR-3982 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0 Reporter: Shawn Heisey Assignee: Stefan Matheis (steffkes) Fix For: 4.2, 5.0 Attachments: SOLR-3982.patch, SOLR-3982.patch Started with Shawn's Request about a small refresh link, one change leads to the next, which is the reason why i changed this issue towards a more common one This Patch brings: * A Refresh Status Button * A Abort Import Button * Improved Status-Handling _(was buggy if you have multiple Cores with Handlers for Dataimport defined and you switched the view while at least one was running)_ * Additional Stats on Rows/Documents _(on-the-fly calculated X Docs/second)_ * less buggy duration-to-readable-time conversion _(until now resulted in NaN's showing up on your Screen)_ Original Description: {quote}The dataimport section under each core on the admin gui does not provide a way to get the current import status. I actually would like to see it automatically pull the status as soon as you click on Dataimport ... I have never seen an import status with a qtime above 1 millisecond. A refresh icon/link would be good to have as well. Additional note: the resulting URL in the address bar is a little odd: http://server:port/solr/#/corename/dataimport//dataimport{quote} Although i gave a short explanation on the URL looking a bit odd: The first dataimport is required for the UI to detect which section you're browsing .. the second /dataimport (including the slash, yes) is coming from your solrconfig :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4151) DIH 'debug' mode missing from 4.x UI
[ https://issues.apache.org/jira/browse/SOLR-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-4151. - Resolution: Duplicate Fix Version/s: 4.1 Assignee: Stefan Matheis (steffkes) Marking as 'Duplicate', not completely correct but imho better than a (stupid) 'Fixed' DIH 'debug' mode missing from 4.x UI Key: SOLR-4151 URL: https://issues.apache.org/jira/browse/SOLR-4151 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Reporter: Hoss Man Assignee: Stefan Matheis (steffkes) Fix For: 4.1 The new Admin UI in trunk 4.x supports most of the DIH related functionality but the debug options were not implemented. http://wiki.apache.org/solr/DataImportHandler#Interactive_Development_Mode -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436 ] Greg Bowyer commented on LUCENE-3178: - {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote{ Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is both maddening for me and even more damning . Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436 ] Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM: -- {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote} Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is both maddening for me and even more damning . was (Author: gbow...@fastmail.co.uk): {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote{ Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is
[jira] [Commented] (SOLR-3755) shard splitting
[ https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550477#comment-13550477 ] Mark Miller commented on SOLR-3755: --- This has a back compat break that we should address somehow or at least mention in changes - previously you could specify explicit shard ids and still get distributed updates - now if you do that, you won't get distrib updates as shards won't be assigned ranges. shard splitting --- Key: SOLR-3755 URL: https://issues.apache.org/jira/browse/SOLR-3755 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Yonik Seeley Attachments: SOLR-3755.patch, SOLR-3755.patch We can currently easily add replicas to handle increases in query volume, but we should also add a way to add additional shards dynamically by splitting existing shards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]
[ https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550488#comment-13550488 ] Dawid Weiss commented on LUCENE-4678: - This looks very cool! I looked at the patch briefly but I need to apply it to make sense of the whole picture. :) {code} + while(skip 0) { +buffer.writeByte((byte) 0); +skip--; + } {code} this doesn't look particularly efficient but I didn't get the context where it's actually used from the patch so maybe it's all right. FST should use paged byte[] instead of single contiguous byte[] --- Key: LUCENE-4678 URL: https://issues.apache.org/jira/browse/LUCENE-4678 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4678.patch, LUCENE-4678.patch The single byte[] we use today has several limitations, eg it limits us to 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and it causes big RAM spikes during building when a the array has to grow. I took basically the same approach as LUCENE-3298, but I want to break out this patch separately from changing all int - long for 2.1 GB support. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries
Roman Chyla created LUCENE-4679: --- Summary: LowercaseExpandedTermsQueryNodeProcessor changes regex queries Key: LUCENE-4679 URL: https://issues.apache.org/jira/browse/LUCENE-4679 Project: Lucene - Core Issue Type: Wish Reporter: Roman Chyla Priority: Trivial This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, \\W should stay uppercase, but it will be lowercased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries
[ https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Chyla updated LUCENE-4679: Attachment: LUCENE-4679.patch LowercaseExpandedTermsQueryNodeProcessor changes regex queries -- Key: LUCENE-4679 URL: https://issues.apache.org/jira/browse/LUCENE-4679 Project: Lucene - Core Issue Type: Wish Reporter: Roman Chyla Priority: Trivial Attachments: LUCENE-4679.patch This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, \\W should stay uppercase, but it will be lowercased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries
[ https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Chyla updated LUCENE-4679: Description: This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, W should stay uppercase, but it will be lowercased. was: This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, \\W should stay uppercase, but it will be lowercased. LowercaseExpandedTermsQueryNodeProcessor changes regex queries -- Key: LUCENE-4679 URL: https://issues.apache.org/jira/browse/LUCENE-4679 Project: Lucene - Core Issue Type: Wish Reporter: Roman Chyla Priority: Trivial Attachments: LUCENE-4679.patch This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, W should stay uppercase, but it will be lowercased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries
[ https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Chyla updated LUCENE-4679: Description: This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, W should stay uppercase, but it is lowercased. was: This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, W should stay uppercase, but it will be lowercased. LowercaseExpandedTermsQueryNodeProcessor changes regex queries -- Key: LUCENE-4679 URL: https://issues.apache.org/jira/browse/LUCENE-4679 Project: Lucene - Core Issue Type: Wish Reporter: Roman Chyla Priority: Trivial Attachments: LUCENE-4679.patch This is really a very silly request, but could the lowercase processor 'abstain' from changing regex queries? For example, W should stay uppercase, but it is lowercased. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550538#comment-13550538 ] Hoss Man commented on LUCENE-4134: -- bq. What other script did you have in mind for the maven files? I just ment whatever we currently do to push them to to where ever we push them once the VOTE is official -- if that's currently bundled up i na script that also scp's the files to people.apache.org:/dist, then lets only worry about changing the people.apache.org part to start committing to svn, and worry about switching to RCs in svn and how we upload to maven from there later. modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3982) Admin UI: Various Dataimport Improvements
[ https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3982: Attachment: SOLR-3982.patch After a quick chat with [~elyograg], we decided to show the animated spinner only if auto-refresh is activated, otherwise the user might be confused. Admin UI: Various Dataimport Improvements - Key: SOLR-3982 URL: https://issues.apache.org/jira/browse/SOLR-3982 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0 Reporter: Shawn Heisey Assignee: Stefan Matheis (steffkes) Fix For: 4.2, 5.0 Attachments: SOLR-3982.patch, SOLR-3982.patch, SOLR-3982.patch Started with Shawn's Request about a small refresh link, one change leads to the next, which is the reason why i changed this issue towards a more common one This Patch brings: * A Refresh Status Button * A Abort Import Button * Improved Status-Handling _(was buggy if you have multiple Cores with Handlers for Dataimport defined and you switched the view while at least one was running)_ * Additional Stats on Rows/Documents _(on-the-fly calculated X Docs/second)_ * less buggy duration-to-readable-time conversion _(until now resulted in NaN's showing up on your Screen)_ Original Description: {quote}The dataimport section under each core on the admin gui does not provide a way to get the current import status. I actually would like to see it automatically pull the status as soon as you click on Dataimport ... I have never seen an import status with a qtime above 1 millisecond. A refresh icon/link would be good to have as well. Additional note: the resulting URL in the address bar is a little odd: http://server:port/solr/#/corename/dataimport//dataimport{quote} Although i gave a short explanation on the URL looking a bit odd: The first dataimport is required for the UI to detect which section you're browsing .. the second /dataimport (including the slash, yes) is coming from your solrconfig :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 4.1 release
The window of Monday through Wednesday sounds like a great target. Nothing says that the first RC has to be final. If whoever is doing the branch wants to do it on Monday rather than Tuesday, fine. If one or more of these nasty blockers gets fixed on Tuesday, we should still be open to a re-spin to put quality over a mere day or two of delay. But draw a hard line on Wednesday. -- Jack Krupansky -Original Message- From: Mark Miller Sent: Thursday, January 10, 2013 3:36 PM To: dev@lucene.apache.org Subject: Re: 4.1 release Saying tomorrow without any date that gives anyone any time to do anything is out of nowhere to me. People in Europe and east of that will wake up and find out, oh today. While pressure has been building towards a release, no one has proposed a date for a cutoff. I think that is always only fair. I think that if you were desperate to cut off to blockers tomorrow, you should have called for that last week. Robert Muir's short term releases are not threatened by allowing people to plan and execute a release together. You can take that too far and do damage from the opposite direction. Giving people time to tie things up with a real deadline is only fair. We all know a nebulous deadline is not conducive to finishing up work. I think all releases should have a known date that we agree on that gives developers some time to finish what they are working on or what they believe is important for the release. At a minimum there should be a few days for this. A weekend involved only seems fair. This doesn't have to be a long time, but it should not require we file blockers and just seems like a friendly way to develop together. Monday is fine by me if others buy into it. Otherwise, we have taken 4 or 5 months for 4.1. Let's not drag it out another month. But let's not do the reverse and release it tonight. The sensible approach always seems like we should plan out some target dates on the list - dates that actually give devs a chance to respond to - and then follow through on those dates. - Mark On Jan 10, 2013, at 3:26 PM, Steve Rowe sar...@gmail.com wrote: Okay - I can see your logic, Mark, but this is not even close to out of nowhere. You yourself have been vocal about making a 4.1 release for a couple weeks now. I agree with Robert Muir that we should be promoting short turnaround releases. If it doesn't make this release, it'll make the next one, which will come out in a relatively short span of time. In this model, Blocker issues are the drivers, not Fix Version.If people want stuff in the release, they should mark their issue as Blocker. How about a compromise - next Monday we branch and only allow Blockers to block the release? Steve On Jan 10, 2013, at 3:08 PM, Mark Miller markrmil...@gmail.com wrote: -1 from me - I don't like not giving people a target date to clean things up by. No one has given a proposed date to try and tie things up by - just calling 'hike is tomorrow' out of nowhere doesn't seem right to me. We have a lot of people working on this over a lot of timezones. I think we should do the right thing and give everyone at least a few days and a weekend to finish getting their issues into 4.1. - Mark On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote: I'd like to start sooner than next Tuesday. I propose to make the branch tomorrow, and only allow Blocker issues to hold up the release after that. A release candidate should then be possible by the middle of next week. Steve On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote: I'd like to release soon. What else blocks this? I think we should toss out a short term date (next tuesday?) for anyone to get in what they need for 4.1. Then just consider blockers after branching? Then release? Objections, better ideas? I think we should give a bit of time for people to finish up what's in flight or fix any blockers. Then we should heighten testing and allow for any new blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550551#comment-13550551 ] Steve Rowe commented on LUCENE-4134: bq. personally i would prefer if we don't have a separate script for changing the maven files. I'm not really sure what this tester is currently doing. s/changing/checking/ ? Here's what the maven artifact checking portion of the smoke tester currently does: # Downloads the POM templates from the branch tag in Subversion (for later checking that all checked-in POM templates have corresponding artifacts) # Downloads all the files under the {{maven/}} directories at the RC location # Verifies that there is a deployed POM for each binary jar/war # Verifies there is a binary jar for each POM template # Verifies that the md5/sha1 digests for each Maven jar/war exist and are correct # Verifies there is a source and javadocs jar for each binary jar # Verifies that each deployed POM's artifactId/groupId (pulled from the POM) matches the POM's dir+filename # Verifies that there is the binary jar for each deployed POM # Downloads and unpacks the official distributions, and also unpacks the Solr war # Verifies that the Maven binary artifacts have same-named files (after adding apache- to the Maven Solr jars/war) These are a couple of additional steps in there to handle non-Mavenized dependencies, which we don't have any of anymore; these steps could be removed. bq. Its scary to me that different build systems are producing different artifacts *All* the Maven artifacts are produced by Ant, not by Maven and not by maven-ant-tasks. bq. And i know the checking isn't good enough when i see basic shit like things not even named the same way: SOLR-4287 maven-ant-tasks renames the Solr artifacts based on the Maven jar naming convention: artifactId-version(-type).jar - groupId org.apache.solr is not included. This has been the Solr Maven artifact naming scheme since Solr artifacts started being published on the Maven central repository (v1.3). Using the Solr naming convention would result in the coordinates {{org.apache.solr.apache-solr.*}}, or maybe even {{org.apache.apache-solr:apache-solr.*}}, both of which look goofy to me. I *think* Maven can technically handle artifact naming schemes that differ from artifactId-version(-type).jar, but I've never done that before, and I personally don't think it's worth the effort, especially given the IMHO goofy result. Before SOLR-4287, I haven't seen anybody complain. (If you look at SOLR-4287, by the way, the suggestion isn't to change Maven naming, it's to change the official Solr artifact naming.) modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550551#comment-13550551 ] Steve Rowe edited comment on LUCENE-4134 at 1/10/13 11:06 PM: -- bq. personally i would prefer if we don't have a separate script for changing the maven files. I'm not really sure what this tester is currently doing. s/changing/checking/ ? Here's what the maven artifact checking portion of the smoke tester currently does: # Downloads the POM templates from the branch tag in Subversion (for later checking that all checked-in POM templates have corresponding artifacts) # Downloads all the files under the {{maven/}} directories at the RC location # Verifies that there is a deployed POM for each binary jar/war # Verifies there is a binary jar for each POM template # Verifies that the md5/sha1 digests for each Maven jar/war exist and are correct # Verifies there is a source and javadocs jar for each binary jar # Verifies that each deployed POM's artifactId/groupId (pulled from the POM) matches the POM's dir+filename # Verifies that there is the binary jar for each deployed POM # Downloads and unpacks the official distributions, and also unpacks the Solr war # Verifies that the Maven binary artifacts have same-named files (after adding apache- to the Maven Solr jars/war) These are a couple of additional steps in there to handle non-Mavenized dependencies, which we don't have any of anymore; these steps could be removed. bq. Its scary to me that different build systems are producing different artifacts *All* the Maven artifacts are produced by Ant, not by Maven and not by maven-ant-tasks. bq. And i know the checking isn't good enough when i see basic shit like things not even named the same way: SOLR-4287 maven-ant-tasks renames the Solr artifacts based on the Maven jar naming convention: artifactId-version(-type).jar - groupId org.apache.solr is not included. This has been the Solr Maven artifact naming scheme since Solr artifacts started being published on the Maven central repository (v1.3). Using the Solr naming convention would result in the coordinates {{org.apache.solr.apache-solr.\*}}, or maybe even {{org.apache.apache-solr:apache-solr.\*}}, both of which look goofy to me. I *think* Maven can technically handle artifact naming schemes that differ from artifactId-version(-type).jar, but I've never done that before, and I personally don't think it's worth the effort, especially given the IMHO goofy result. Before SOLR-4287, I haven't seen anybody complain. (If you look at SOLR-4287, by the way, the suggestion isn't to change Maven naming, it's to change the official Solr artifact naming.) was (Author: steve_rowe): bq. personally i would prefer if we don't have a separate script for changing the maven files. I'm not really sure what this tester is currently doing. s/changing/checking/ ? Here's what the maven artifact checking portion of the smoke tester currently does: # Downloads the POM templates from the branch tag in Subversion (for later checking that all checked-in POM templates have corresponding artifacts) # Downloads all the files under the {{maven/}} directories at the RC location # Verifies that there is a deployed POM for each binary jar/war # Verifies there is a binary jar for each POM template # Verifies that the md5/sha1 digests for each Maven jar/war exist and are correct # Verifies there is a source and javadocs jar for each binary jar # Verifies that each deployed POM's artifactId/groupId (pulled from the POM) matches the POM's dir+filename # Verifies that there is the binary jar for each deployed POM # Downloads and unpacks the official distributions, and also unpacks the Solr war # Verifies that the Maven binary artifacts have same-named files (after adding apache- to the Maven Solr jars/war) These are a couple of additional steps in there to handle non-Mavenized dependencies, which we don't have any of anymore; these steps could be removed. bq. Its scary to me that different build systems are producing different artifacts *All* the Maven artifacts are produced by Ant, not by Maven and not by maven-ant-tasks. bq. And i know the checking isn't good enough when i see basic shit like things not even named the same way: SOLR-4287 maven-ant-tasks renames the Solr artifacts based on the Maven jar naming convention: artifactId-version(-type).jar - groupId org.apache.solr is not included. This has been the Solr Maven artifact naming scheme since Solr artifacts started being published on the Maven central repository (v1.3). Using the Solr naming convention would result in the coordinates {{org.apache.solr.apache-solr.*}}, or maybe even {{org.apache.apache-solr:apache-solr.*}}, both of which look goofy to me. I *think* Maven can technically handle artifact naming
[jira] [Comment Edited] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436 ] Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 11:08 PM: --- {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote} Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all assuming that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is both maddening for me and even more damning . was (Author: gbow...@fastmail.co.uk): {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote} Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains
[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)
[ https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550555#comment-13550555 ] Steve Rowe commented on LUCENE-4134: {quote} bq. What other script did you have in mind for the maven files? I just ment whatever we currently do to push them to to where ever we push them once the VOTE is official – if that's currently bundled up i na script that also scp's the files to people.apache.org:/dist, then lets only worry about changing the people.apache.org part to start committing to svn, and worry about switching to RCs in svn and how we upload to maven from there later. {quote} The process is here: [http://wiki.apache.org/lucene-java/PublishMavenArtifacts]. It's a two step process: first an Ant task stages the artifacts to the Nexus repository at {{repository.apache.org}}. Then when the VOTE succeeds, the RM clicks a button on the Nexus web interface to publish them, and a few hours later they get synch'd to the Maven central repository. modify release process/scripts to use svn for rc/release publishing (svnpubsub) --- Key: LUCENE-4134 URL: https://issues.apache.org/jira/browse/LUCENE-4134 Project: Lucene - Core Issue Type: Task Reporter: Hoss Man Priority: Blocker Fix For: 4.1 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be entirely managed using svnpubsub ... our use of the Apache CMS for lucene.apache.org puts us in compliance for our main website, but the dist dir use for publishing release artifacts also needs to be manaved via svn. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org