[jira] [Commented] (LUCENE-4669) Document wrongly deleted from index

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549495#comment-13549495
 ] 

Adrien Grand commented on LUCENE-4669:
--

Hi Miguel,

bq. One more question: what's the best way to iterate over all documents in an 
index?

Retrieving stored fields for all documents in an index is something Lucene is 
bad at (it doesn't optimize for this use-case on purpose), and you should try 
to avoid doing it. Otherwise, iterating over all doc ids from 0 to ir.maxDoc(), 
skipping deleted documents (liveDocs != null  !liveDocs.get(docID)) and 
calling IndexReader.document(docID) should work.

Please ask questions on the user mailing-list instead of JIRA in the future.

 Document wrongly deleted from index
 ---

 Key: LUCENE-4669
 URL: https://issues.apache.org/jira/browse/LUCENE-4669
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
 Environment: OS = Mac OS X 10.7.5
 Java = JVM 1.6
Reporter: Miguel Ferreira

 I'm trying to implement document deletion from an index.
 If I create an index with three documents (A, B and C) and then try to delete 
 A, A gets marked as deleted but C is removed from the index. I've tried this 
 with different number of documents and saw that it is always the last 
 document that is removed.
 When I run the example unit test code bellow I get this output:
 {code}
 Before delete
 Found 3 documents
 Document at = 0; isDeleted = false; path = a; 
 Document at = 1; isDeleted = false; path = b; 
 Document at = 2; isDeleted = false; path = c; 
 After delete
 Found 2 documents
 Document at = 0; isDeleted = true; path = a; 
 Document at = 1; isDeleted = false; path = b; 
 {code}
 Example unit test:
 {code:title=ExampleUnitTest.java}
 @Test
 public void delete() throws Exception {
 File indexDir = FileUtils.createTempDir();
 IndexWriter writer = new IndexWriter(new NIOFSDirectory(indexDir), 
 new IndexWriterConfig(Version.LUCENE_40,
 new StandardAnalyzer(Version.LUCENE_40)));
 Document doc = new Document();
 String fieldName = path;
 doc.add(new StringField(fieldName, a, Store.YES));
 writer.addDocument(doc);
 doc = new Document();
 doc.add(new StringField(fieldName, b, Store.YES));
 writer.addDocument(doc);
 doc = new Document();
 doc.add(new StringField(fieldName, c, Store.YES));
 writer.addDocument(doc);
 writer.commit();
 System.out.println(Before delete);
 print(indexDir);
 writer.deleteDocuments(new Term(fieldName, a));
 writer.commit();
 System.out.println(After delete);
 print(indexDir);
 }
 public static void print(File indexDirectory) throws IOException {
 DirectoryReader reader = DirectoryReader.open(new 
 NIOFSDirectory(indexDirectory));
 Bits liveDocs = MultiFields.getLiveDocs(reader);
 int numDocs = reader.numDocs();
 System.out.println(Found  + numDocs +  documents);
 for (int i = 0; i  numDocs; i++) {
 Document document = reader.document(i);
 StringBuffer sb = new StringBuffer();
 sb.append(Document at = ).append(i);
 sb.append(; isDeleted = ).append(liveDocs != null ? 
 !liveDocs.get(i) : false).append(; );
 for (IndexableField field : document.getFields()) {
 String fieldName = field.name();
 for (String value : document.getValues(fieldName)) {
 sb.append(fieldName).append( = 
 ).append(value).append(; );
 }
 }
 System.out.println(sb.toString());
 }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 57 - Failure!

2013-01-10 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/57/
Java: 64bit/jdk1.6.0 -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 24615 lines...]
BUILD FAILED
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-4.x-MacOSX/build.xml:60: The 
following error occurred while executing this line:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-4.x-MacOSX/lucene/build.xml:310:
 The following error occurred while executing this line:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-4.x-MacOSX/lucene/common-build.xml:1920:
 javax.script.ScriptException: javax.script.ScriptException: 
org.parboiled.errors.ParserRuntimeException: Error while parsing action 
'Root/Sequence/ZeroOrMore/Sequence/Block/FirstOf/Heading/FirstOf/AtxHeading/OneOrMore/Sequence/AtxInline/Inline/Inline_Action1'
 at input position (line 1, pos 3):
# JRE Version Migration Guide
  ^

org.pegdown.ParsingTimeoutException
at 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:138)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:247)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.tools.ant.util.ReflectUtil.invoke(ReflectUtil.java:108)
at 
org.apache.tools.ant.util.ReflectWrapper.invoke(ReflectWrapper.java:81)
at 
org.apache.tools.ant.util.optional.JavaxScriptRunner.evaluateScript(JavaxScriptRunner.java:103)
at 
org.apache.tools.ant.util.optional.JavaxScriptRunner.executeScript(JavaxScriptRunner.java:67)
at 
org.apache.tools.ant.types.optional.ScriptFilter.filter(ScriptFilter.java:110)
at org.apache.tools.ant.filters.TokenFilter.read(TokenFilter.java:114)
at 
org.apache.tools.ant.filters.BaseFilterReader.read(BaseFilterReader.java:83)
at java.io.BufferedReader.read1(BufferedReader.java:185)
at java.io.BufferedReader.read(BufferedReader.java:261)
at 
org.apache.tools.ant.util.ResourceUtils.copyResource(ResourceUtils.java:494)
at org.apache.tools.ant.util.FileUtils.copyFile(FileUtils.java:559)
at org.apache.tools.ant.taskdefs.Copy.doFileOperations(Copy.java:875)
at org.apache.tools.ant.taskdefs.Copy.execute(Copy.java:549)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at 
org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:398)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:390)
at org.apache.tools.ant.Target.performTasks(Target.java:411)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
at 
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:302)
at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 

[jira] [Created] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-4674:


 Summary: Consistently set offset=0 in BytesRef.copyBytes
 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


BytesRef.copyBytes(BytesRef other) has two branches:
 - either the destination array is large enough and it will copy bytes after 
offset,
 - or it needs to resize and in that case it will set offset = 0.

I think this method should always set offset = 0 for consistency, and to avoid 
resizing when other.length is larger than this.bytes.length - this.offset but 
smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4674:
-

Attachment: LUCENE-4674.patch

Patch. Additionally I added a call to ArrayUtil.oversize to make resizing less 
likely.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549533#comment-13549533
 ] 

Robert Muir commented on LUCENE-4674:
-

I dont really agree (i dont think this class should be treated like 
stringbuffer).

changing offset to 0 is fine when we make a new array: otherwise it is 
definitely and 100% certainly NOT OK as we may overwrite unrelated data.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549534#comment-13549534
 ] 

Robert Muir commented on LUCENE-4674:
-

moreover, any proposed changes here should also include the changes to IntsRef, 
LongsRef, CharsRef, and so on before even being considered.

Otherwise the apis just get out of wack.

Maybe we should just seriously consider just switching to java.nio.Buffer.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549536#comment-13549536
 ] 

Uwe Schindler commented on LUCENE-4674:
---

I agree with Robert. We had BytesRef and CharsRef in the past doing that stuff. 
But as the name of the class is *Ref not *Buffer, it should only hold a 
reference to a byte[] and not change it or grow it. Esspecially it should not 
change offset. This is risky, if you get a BytesRef that points to some slice 
in a larger buffer and you suddenly resize it, invalidating content that might 
be needed by other stuff (e.g. while iterating terms, the previous/next term 
gets corrupted).

I would in any case favour to use ByteBuffer instead of this unsafe and 
inncomplete duplicate. BytesRef is for user-facing APIs a mess.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4620:
---

Attachment: LUCENE-4620.patch

Patch makes the following changes:

* {{IntEncoder.encode()}} takes an {{IntsRef}} and {{BytesRef}} and encodes the 
integers from {{IntsRef}} to {{BytesRef}}. Similarily, {{IntDecoder.decode()}} 
takes a {{BytesRef}} and {{IntsRef}} and decodes the integers from the byte 
array to the integer array.

* {{CategoryListIterator}} and {{Aggregator}} were changed to do bulk handling 
of category ordinals as well.

* In the process I merged some methods such as {{PayloadIterator.setdoc}} and 
{{PayloadIterator.getPayload}}, as well as {{AssociationsPayloadIterator}}, to 
reduce even further the number of method calls that happen during search.

* Added a test which tests MultiCategoryListIterator (we didn't have one!) and 
improved EncodingTest to test a large number of random values.

All tests pass, and 'ant javadocs' passes too.

 Explore IntEncoder/Decoder bulk API
 ---

 Key: LUCENE-4620
 URL: https://issues.apache.org/jira/browse/LUCENE-4620
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-4620.patch


 Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
 and decode(int). Originally, we believed that this layer can be useful for 
 other scenarios, but in practice it's used only for writing/reading the 
 category ordinals from payload/DV.
 Therefore, Mike and I would like to explore a bulk API, something like 
 encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
 can still be streaming (as we don't know in advance how many ints will be 
 written), dunno. Will figure this out as we go.
 One thing to check is whether the bulk API can work w/ e.g. facet 
 associations, which can write arbitrary byte[], and so may decoding to an 
 IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
 out that associations will use a different bulk API.
 At the end of the day, the requirement is for someone to be able to configure 
 how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
 etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549538#comment-13549538
 ] 

Adrien Grand commented on LUCENE-4674:
--

I still find confusing that we are allowed to write past offset + length but 
not before offset.

Switching to the java.nio buffers sounds good.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549542#comment-13549542
 ] 

Shai Erera commented on LUCENE-4674:


I recently (LUCENE-4620) moved some facets code to use BytesRef and IntsRef and 
found these two classes very convenient. The only thing that I found missing is 
a *Ref.upto. E.g., I first made the mistake {{for (int i = bytes.offset; i  
bytes.length; i++)}}, where the correct form is {{for (int i = bytes.offset; i 
 bytes.length + bytes.offset; i++)}} (but then you need to do that '+' at 
every iteration, or extract it to a variable).

I considered using BytesBuffer instead, but as long as e.g. a Payload is 
represented as a BytesRef, it's a waste to always 
ByteBuffer.wrap(BytesRef.bytes, offset, length). I used BytesRef as it was very 
convenient (and if we add an 'upto' index to them, that'd even be greater :)).

I agree that grow() is currently risky, as it may override some data that is 
used by another thread (as a slice to the buffer).  But that can be solved with 
proper documentation I think.

I also agree that we should not set offset to 0. I did that, and MemoryCodec 
got upset :). For all practical purposes, apps should treat offset and length 
as final (we should not make them final though, just document it). If an app 
messes with them, it should better know what it's doing.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549543#comment-13549543
 ] 

Robert Muir commented on LUCENE-4674:
-

the whole class is confusing.

but the problem with this proposed change is very simple:
BytesRef a = new BytesRef(bigbyte, 0, 5);
BytesRef b = new BytesRef(bigbyte, 5, 10);

b.copy(someOtherStuff...) should *NOT* muck with a. 

A is unrelated to B. 

I think realistically we should avoid methods like append/copy alltogether as 
they encourage more
stringbuffer-type use like this.

if you want a stringbuffer-type class, it can safely support methods like this, 
but then it should
*own the array* (make a copy). 


 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549544#comment-13549544
 ] 

Adrien Grand commented on LUCENE-4674:
--

bq. b.copy(someOtherStuff...) should NOT muck with a.

Unfortunately a.copy(otherStuff) will modify b if otherStuff.length  5.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549554#comment-13549554
 ] 

Robert Muir commented on LUCENE-4674:
-

I will open a new issue to remove all write methods from bytesref.

this is a ref class, not a stringbuilder. we have to keep these apis contained.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier

2013-01-10 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4670.
--

Resolution: Fixed

 Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new 
 formats easier
 --

 Key: LUCENE-4670
 URL: https://issues.apache.org/jira/browse/LUCENE-4670
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.1

 Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, 
 LUCENE-4670.patch


 This is especially useful to LUCENE-4599 where actions have to be taken after 
 a doc/field/term has been added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Erick Erickson
Haven't run across play up in this context (I as raised on the wrong side
of the Atlantic), but three definitions I found _all_ apply:

1 *Brit* *informal* to behave irritatingly (towards)
2 *(intr)* *Brit* *informal* (of a machine, car, etc.) to function
erratically
*3 * *Brit* *informal* to hurt; give (one) pain or trouble

Don't think I've found another two-word phrase that packs that many
varieties of how computers are mean to me in so efficiently. Gotta add that
one to my vocabulary


On Wed, Jan 9, 2013 at 2:40 PM, Greg Bowyer (JIRA) j...@apache.org wrote:


 [
 https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548885#comment-13548885]

 Greg Bowyer commented on LUCENE-3178:
 -

 Frustrating, it echos what I have been seeing so at least my benchmarking
 is not playing me up, I guess I will have to do some digging.

  Native MMapDir
  --
 
  Key: LUCENE-3178
  URL: https://issues.apache.org/jira/browse/LUCENE-3178
  Project: Lucene - Core
   Issue Type: Improvement
   Components: core/store
 Reporter: Michael McCandless
   Labels: gsoc2012, lucene-gsoc-12
  Attachments: LUCENE-3178-Native-MMap-implementation.patch,
 LUCENE-3178-Native-MMap-implementation.patch,
 LUCENE-3178-Native-MMap-implementation.patch
 
 
  Spinoff from LUCENE-2793.
  Just like we will create native Dir impl (UnixDirectory) to pass the
 right OS level IO flags depending on the IOContext, we could in theory do
 something similar with MMapDir.
  The problem is MMap is apparently quite hairy... and to pass the flags
 the native code would need to invoke mmap (I think?), unlike UnixDir where
 the code only has to open the file handle.

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
 administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier

2013-01-10 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549562#comment-13549562
 ] 

Commit Tag Bot commented on LUCENE-4670:


[branch_4x commit] Adrien Grand
http://svn.apache.org/viewvc?view=revisionrevision=1431294

LUCENE-4670: Add finish* callbacks to StoredFieldsWriter and TermVectorsWriter.



 Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new 
 formats easier
 --

 Key: LUCENE-4670
 URL: https://issues.apache.org/jira/browse/LUCENE-4670
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.1

 Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, 
 LUCENE-4670.patch


 This is especially useful to LUCENE-4599 where actions have to be taken after 
 a doc/field/term has been added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-4112) Dataimporting with SolrCloud Fails

2013-01-10 Thread Erick Erickson
Sausarkar:

When you say the index went from 14G to 7G, did you notice whether the
difference was tin the *.fdt and *.fdx files? That would be due to
compression of stored fields which is now the default If you could,
would you let us know the sizes of the files with those two extensions
before after? I'm trying to gather real-world examples...

But about your slowdown, does the same thing happen if you specify
fl=score (and insure that lazy load is configured in solrconfig.xml)? I
don't think that would be reading the fields off disk and decompressing
them...

what are you measuring? Total time to return to the client? It'd also help
pin this down if you looked just at QTime in the responses, that should be
exclusive of time to assemble the documents, it's purely searching.

Thanks,
Erick


On Wed, Jan 9, 2013 at 8:50 PM, sausarkar sausar...@ebay.com wrote:

 We are using solr-meter for generating query load of around 110 Queries per
 second per node.

 With 4.1 with the average query time is 300 msec if we switch to 4.0 the
 average query time is around 11 msec. We used the same load test params and
 same 10 million records, only differences are the version and index files,
 4.1 has 7GB and 4.0 has 14GB.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/jira-Created-SOLR-4112-Dataimporting-with-SolrCloud-Fails-tp4022365p4032084.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549567#comment-13549567
 ] 

Uwe Schindler commented on LUCENE-3178:
---

I think this is largely related to Robert's comment:
bq. Might be interesting to revisit now that we use block compression that 
doesn't readByte(), readByte(), readByte() and hopefully avoids some of the 
bounds checks and so on that I think it helped with.

Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.

Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.

I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549568#comment-13549568
 ] 

Michael McCandless commented on LUCENE-4620:


Looks like there were some svn mv's, so the patch doesn't directly apply ...

Can you regenerate the patch using 'svn diff --show-copies-as-adds' (assuming 
you're using svn 1.7+)?

Either that or use dev-tools/scripts/diffSources.py ... thanks.

 Explore IntEncoder/Decoder bulk API
 ---

 Key: LUCENE-4620
 URL: https://issues.apache.org/jira/browse/LUCENE-4620
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-4620.patch


 Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
 and decode(int). Originally, we believed that this layer can be useful for 
 other scenarios, but in practice it's used only for writing/reading the 
 category ordinals from payload/DV.
 Therefore, Mike and I would like to explore a bulk API, something like 
 encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
 can still be streaming (as we don't know in advance how many ints will be 
 written), dunno. Will figure this out as we go.
 One thing to check is whether the bulk API can work w/ e.g. facet 
 associations, which can write arbitrary byte[], and so may decoding to an 
 IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
 out that associations will use a different bulk API.
 At the end of the day, the requirement is for someone to be able to configure 
 how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
 etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4620:
---

Attachment: LUCENE-4620.patch

Sorry. Can you try now?

 Explore IntEncoder/Decoder bulk API
 ---

 Key: LUCENE-4620
 URL: https://issues.apache.org/jira/browse/LUCENE-4620
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-4620.patch, LUCENE-4620.patch


 Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
 and decode(int). Originally, we believed that this layer can be useful for 
 other scenarios, but in practice it's used only for writing/reading the 
 category ordinals from payload/DV.
 Therefore, Mike and I would like to explore a bulk API, something like 
 encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
 can still be streaming (as we don't know in advance how many ints will be 
 written), dunno. Will figure this out as we go.
 One thing to check is whether the bulk API can work w/ e.g. facet 
 associations, which can write arbitrary byte[], and so may decoding to an 
 IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
 out that associations will use a different bulk API.
 At the end of the day, the requirement is for someone to be able to configure 
 how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
 etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4675:
---

 Summary: remove *Ref.copy/append/grow
 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


These methods are dangerous:

In general if we want a StringBuilder type class, then it should own the array, 
and it can freely do allocation stuff etc. this is the only way to make it safe.

Otherwise if we want a ByteBuffer type class, then its reference should be 
immutable (the byte[]/offset/length should be final), and it should not have 
allocation stuff.

BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
these unsafe, dangerous, trappy APIs directly in front of the user.

What happens if i have a bug in my application and it accidentally mucks with 
the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? 
Will this get merged into a corrupt index?

I think as a start we should remove these copy/append/grow to minimize this 
closer to a ref class (e.g. more like java.lang.ref and less like 
stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4676) IndexReader.isCurrent race

2013-01-10 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4676:
---

 Summary: IndexReader.isCurrent race
 Key: LUCENE-4676
 URL: https://issues.apache.org/jira/browse/LUCENE-4676
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Revision: 1431169

ant test  -Dtestcase=TestNRTManager 
-Dtests.method=testThreadStarvationNoDeleteNRTReader 
-Dtests.seed=925ECD106FBFA3FF -Dtests.slow=true -Dtests.locale=fr_CA 
-Dtests.timezone=America/Kentucky/Louisville -Dtests.file.encoding=US-ASCII 
-Dtests.dups=500

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549589#comment-13549589
 ] 

Shai Erera commented on LUCENE-4675:


I kinda like grow(). Will I be able to grow() the buffer from the outside if 
you remove it? I.e. will the byte[] not be final?

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549590#comment-13549590
 ] 

Robert Muir commented on LUCENE-4675:
-

I'm proposing removing these 3 methods from BytesRef itself, thats all.

The guy from the outside knows what he can do: he knows if the bytes actually 
point to a slice of a PagedBytes
(grow is actually senseless here!), or just a simple byte[], or whatever. He 
doesn't need BytesRef itself to do these things.

So he can then change the ref to point at a different slice, or different 
byte[] alltogether, or whatever.

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4674.
--

Resolution: Won't Fix

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549596#comment-13549596
 ] 

Robert Muir commented on LUCENE-4674:
-

{quote}
Unfortunately a.copy(otherStuff) will modify b if otherStuff.length  5.
{quote}

I still like the idea of fixing this myself (maybe Shai's idea?). i don't like 
this kind of dangerous stuff!!

I ultimately think LUCENE-4675 is the next logical step, but can we remove this 
a.copy()-overwrites-b trap as an incremental improvement?

thats a bug in my opinion.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549598#comment-13549598
 ] 

Shai Erera commented on LUCENE-4675:


ok. While you're at it, what do you think about adding an 'upto' member for 
easier iteration on the bytes/ints/chars? (see my comment on LUCENE-4674)

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549611#comment-13549611
 ] 

Robert Muir commented on LUCENE-4675:
-

i dont think we need any additional members in this thing. what more does it 
need other than byte[], offset, length?!

i want to remove the extraneous stuff. if you want to make an iterator, you can 
separately make your own BytesRefIterator class?

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549625#comment-13549625
 ] 

Adrien Grand commented on LUCENE-4674:
--

bq. I still like the idea of fixing this myself (maybe Shai's idea?). i don't 
like this kind of dangerous stuff!!

The 'upto' idea or allocating a new byte[] if someOtherStuff offset + length  
this.offset + length? ?

bq. I ultimately think LUCENE-4675 is the next logical step, but can we remove 
this a.copy()-overwrites-b trap as an incremental improvement?

Regarding the idea to switch to the java.nio buffers, are there some traps 
besides backward compatibility? Should we start migrating our internal APIs to 
this API (and maybe even the public ones for 5.0?).

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549630#comment-13549630
 ] 

Robert Muir commented on LUCENE-4674:
-

{quote}
allocating a new byte[] if someOtherStuff offset + length  this.offset + 
length? ?
{quote}

This, preventing a.copy(otherStuff) from overflowing onto b.

I dont want any other functionality in this class. it needs less, not more.

{quote}
Regarding the idea to switch to the java.nio buffers, are there some traps 
besides backward compatibility? Should we start migrating our internal APIs to 
this API (and maybe even the public ones for 5.0?).
{quote}

I haven't even thought about it really. I actually am less concerned about our 
internal apis. 

Its the public ones i care about.

I would care a lot less about BytesRef  co if users werent forced to interact 
with them.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549632#comment-13549632
 ] 

Shai Erera commented on LUCENE-4675:


bq. you can separately make your own BytesRefIterator class

I can. I wanted to avoid additional object allocations, but such an Iterator 
class can have a reset(BytesRef) method which will update pos and upto members 
accordingly. I was thinking that an 'upto' index might be useful for others. 
For my purposes (see LUCENE-4620) I just use bytes.offset as 'pos' and compute 
an 'upto' and passes it along. I will think about the Iterator class though, 
perhaps it's not a bad idea. And maybe *Ref can have an iterator() method which 
returns the proper one ... or not.

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549637#comment-13549637
 ] 

Robert Muir commented on LUCENE-4675:
-

I dont think we should add more functionality to these *Ref classes: they have 
too many traps and bugs already.

Less is more here.

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 70 - Failure!

2013-01-10 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/70/
Java: 64bit/jdk1.6.0 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 8383 lines...]
[junit4:junit4] ERROR: JVM J0 ended with an exception, command line: 
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java 
-XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps
 -Dtests.prefix=tests -Dtests.seed=BC09482A7937D842 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random 
-Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz 
-Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/testlogging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath 

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 70 - Failure!

2013-01-10 Thread Robert Muir
JVM Crash:

[junit4:junit4] Suite: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest
[junit4:junit4] Completed in 32.12s, 1 test
[junit4:junit4]
[junit4:junit4] JVM J0: stdout was not empty, see:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20130110_132632_493.sysout
[junit4:junit4]  JVM J0: stdout (verbatim) 
[junit4:junit4] Invalid memory access of location 0x0 rip=0x7fff8f93db43
[junit4:junit4]  JVM J0: EOF 
[junit4:junit4] Execution time total: 18 minutes 36 seconds


On Thu, Jan 10, 2013 at 8:45 AM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/70/
 Java: 64bit/jdk1.6.0 -XX:+UseSerialGC

 All tests passed

 Build Log:
 [...truncated 8383 lines...]
 [junit4:junit4] ERROR: JVM J0 ended with an exception, command line: 
 /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java 
 -XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError 
 -XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps
  -Dtests.prefix=tests -Dtests.seed=BC09482A7937D842 -Xmx512M -Dtests.iters= 
 -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
 -Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random 
 -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz 
 -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass 
 -Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/testlogging.properties
  -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
 -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
 -Djava.io.tmpdir=. 
 -Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
  
 -Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
  -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
 -Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
  -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
 -Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath 
 

[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values

2013-01-10 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549651#comment-13549651
 ] 

Varun Thacker commented on LUCENE-3354:
---

Hi,

I have a doubt on FieldCache supporting MultiValued fields in general. So 
FieldCache on a multiValued field works by consuming it from 
FieldCache.DocTermOrds but,

* I was trying out FunctionQuery in Solr and still got a cannot FieldCache on 
multiValued field error. This is because any impl. of FieldCacheSource for 
example StrFieldSource#getValues() returns DocTermsIndexDocValues where 
FieldCache.DocTermsIndex instance loads up. Is this supposed to be consumed 
like this? 

* Secondly slightly off topic but I went through the lucene4547 branch where 
there was a discussion on how to consume DocValues. I'm still trying to figure 
a lot of stuff around DocValues, FieldCache etc. but do we need to discuss all 
these issues and it's impact on Solr and ES as a whole?

 Extend FieldCache architecture to multiple Values
 -

 Key: LUCENE-3354
 URL: https://issues.apache.org/jira/browse/LUCENE-3354
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Bill Bell
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-3354.patch, LUCENE-3354.patch, 
 LUCENE-3354_testspeed.patch


 I would consider this a bug. It appears lots of people are working around 
 this limitation, 
 why don't we just change the underlying data structures to natively support 
 multiValued fields in the FieldCache architecture?
 Then functions() will work properly, and we can do things like easily 
 geodist() on a multiValued field.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549654#comment-13549654
 ] 

Uwe Schindler commented on LUCENE-4675:
---

Strong +1 to make BytesRef a byte[] reference only. BytesRef is unfortunately a 
user-facing class in Lucene 4.x, so we have to look into this. I was also 
planning to fix this before 4.0, but we had no time. This was one of the last 
classes, Robert and I did not fix in the final cleanup before release, which is 
a pity.

 remove *Ref.copy/append/grow
 

 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 These methods are dangerous:
 In general if we want a StringBuilder type class, then it should own the 
 array, and it can freely do allocation stuff etc. this is the only way to 
 make it safe.
 Otherwise if we want a ByteBuffer type class, then its reference should be 
 immutable (the byte[]/offset/length should be final), and it should not have 
 allocation stuff.
 BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
 these unsafe, dangerous, trappy APIs directly in front of the user.
 What happens if i have a bug in my application and it accidentally mucks with 
 the term bytes returned by TermsEnum or the payloads from 
 DocsAndPositionsEnum? Will this get merged into a corrupt index?
 I think as a start we should remove these copy/append/grow to minimize this 
 closer to a ref class (e.g. more like java.lang.ref and less like 
 stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
 on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config

2013-01-10 Thread JIRA
Yago Riveiro Rodríguez created SOLR-4292:


 Summary: After upload and link config collection, the collection 
in solrcloud not load the new config
 Key: SOLR-4292
 URL: https://issues.apache.org/jira/browse/SOLR-4292
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: CentOS release 6.3 (Final)

Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 
x86_64 x86_64 x86_64 GNU/Linux
Reporter: Yago Riveiro Rodríguez


I'm trying to change the settings for a specific collection, which is empty, 
with a new config.

The collection has 2 shards, and the zookeeper is a cluster of 3 servers.

I used the zookeeper to upload the configuration and link it with the 
collection. After this, I reloaded the collection in both nodes (replica and 
leader) but when I try to see the STATUS of collection's core 
(/solr/admin/cores?action=STATUSwt=jsonindent=true) I get this error:
 
ST-4A46DF1563_0812:org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
 Specified config does not exist in 
ZooKeeper:statisticsBucket-aggregation-revision-1

The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the 
configname: {configName:statisticsBucket-aggregation-revision-1}

If the zookeeper has the new config loaded and I linked the config to the 
collection, why the status of core says that the configuration is missing?

/Yago

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4293) Solr throws an NPE when extracting update handled called with an empty document

2013-01-10 Thread Karl Wright (JIRA)
Karl Wright created SOLR-4293:
-

 Summary: Solr throws an NPE when extracting update handled called 
with an empty document
 Key: SOLR-4293
 URL: https://issues.apache.org/jira/browse/SOLR-4293
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Karl Wright


When you send an empty document to update/extract, you get this:

{code}
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:164)
at 
org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4293) Solr throws an NPE when extracting update handled called with an empty document

2013-01-10 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated SOLR-4293:
--

Attachment: SOLR-4293.patch

This patch should fix the problem.

 Solr throws an NPE when extracting update handled called with an empty 
 document
 ---

 Key: SOLR-4293
 URL: https://issues.apache.org/jira/browse/SOLR-4293
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Karl Wright
 Attachments: SOLR-4293.patch


 When you send an empty document to update/extract, you get this:
 {code}
 SEVERE: java.lang.NullPointerException
   at 
 org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:164)
   at 
 org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115)
   at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
   at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
   at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383)
   at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config

2013-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549712#comment-13549712
 ] 

Yago Riveiro Rodríguez commented on SOLR-4292:
--

My fault, I wrote the confname parameter incorrectly. btw the zookeeper's log 
is so verbose that the error hasn't visibility.

 After upload and link config collection, the collection in solrcloud not load 
 the new config
 

 Key: SOLR-4292
 URL: https://issues.apache.org/jira/browse/SOLR-4292
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: CentOS release 6.3 (Final)
 Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 
 2012 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Yago Riveiro Rodríguez

 I'm trying to change the settings for a specific collection, which is empty, 
 with a new config.
 The collection has 2 shards, and the zookeeper is a cluster of 3 servers.
 I used the zookeeper to upload the configuration and link it with the 
 collection. After this, I reloaded the collection in both nodes (replica and 
 leader) but when I try to see the STATUS of collection's core 
 (/solr/admin/cores?action=STATUSwt=jsonindent=true) I get this error:
  
 ST-4A46DF1563_0812:org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Specified config does not exist in 
 ZooKeeper:statisticsBucket-aggregation-revision-1
 The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the 
 configname: {configName:statisticsBucket-aggregation-revision-1}
 If the zookeeper has the new config loaded and I linked the config to the 
 collection, why the status of core says that the configuration is missing?
 /Yago

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config

2013-01-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yago Riveiro Rodríguez closed SOLR-4292.


Resolution: Not A Problem

 After upload and link config collection, the collection in solrcloud not load 
 the new config
 

 Key: SOLR-4292
 URL: https://issues.apache.org/jira/browse/SOLR-4292
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: CentOS release 6.3 (Final)
 Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 
 2012 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Yago Riveiro Rodríguez

 I'm trying to change the settings for a specific collection, which is empty, 
 with a new config.
 The collection has 2 shards, and the zookeeper is a cluster of 3 servers.
 I used the zookeeper to upload the configuration and link it with the 
 collection. After this, I reloaded the collection in both nodes (replica and 
 leader) but when I try to see the STATUS of collection's core 
 (/solr/admin/cores?action=STATUSwt=jsonindent=true) I get this error:
  
 ST-4A46DF1563_0812:org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  Specified config does not exist in 
 ZooKeeper:statisticsBucket-aggregation-revision-1
 The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the 
 configname: {configName:statisticsBucket-aggregation-revision-1}
 If the zookeeper has the new config loaded and I linked the config to the 
 collection, why the status of core says that the configuration is missing?
 /Yago

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update

2013-01-10 Thread Ben Pennell (JIRA)
Ben Pennell created SOLR-4294:
-

 Summary: Solr 4 atomic update incorrect value when setting two or 
more values to a multivalue via XML update
 Key: SOLR-4294
 URL: https://issues.apache.org/jira/browse/SOLR-4294
 Project: Solr
  Issue Type: Bug
  Components: clients - java, update
Affects Versions: 4.0
 Environment: RHEL
Reporter: Ben Pennell
Priority: Minor
 Fix For: 4.0.1, 4.1


Setting multiple values to a multivalued field via an XML atomic update request 
is resulting in what appears to be the output of a toString() method.  See the 
examples below.

I ran into this issue using the output for atomic updates from the fix for 
Solr-4133 to ClientUtils.  The server being used is the base 4.0.0 release.

{code}
curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' -d '
adddoc boost=1.0
field name=idtest/field
field name=status update=setone/field
field name=status update=settwo/field
/doc/add'
{code}
Yields the following in Solr:
{code}
  arr name=statusstr{set=one}/strstr{set=two}/str/arr
{code}

Changing the second set to an add has the same effect.

  If I only set one value though, it works correctly:
{code}
adddoc boost=1.0
field name=idtest/field
field name=status update=setone/field
/doc/add
{code}
  Yields:
{code}
arr name=statusstrone/str/arr
{code}

  It also works fine if I split it into two operations
{code}
adddoc boost=1.0
field name=idtest/field
field name=status update=setone/field
/doc/add
adddoc boost=1.0
field name=idtest/field
field name=status update=addtwo/field
/doc/add
{code}
  Yields:
{code}
arr name=statusstrone/strstrtwo/str/arr
{code}

  Oddly, it works fine as a singe request in JSON:
{code}
curl -k 'http://localhost/solr/update?commit=true' -H 
'Content-type:application/json' -d '[id:test, {status:{set:[one, 
two]}}]'
{code}
  Yields:
{code}
arr name=statusstrone/strstrtwo/str/arr
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549759#comment-13549759
 ] 

Michael McCandless commented on LUCENE-4620:


Thanks Shai, that new patch worked!

This patch looks great!

It's a little disturbing that every doc must make a new
HashMapString,BytesRef at indexing time (seems like a lot of
overhead/objects when the common case just needs to return a single
BytesRef, which could be re-used).  Can we use
Collections.singletonMap when there are no partitions?

The decode API (more important than encode) looks like it reuses the
Bytes/IntsRef, so that's good.

Hmm why do we have VInt8.bytesNeeded?  Who uses that?  I think that's
a dangerous API to have  it's better to simply encode and then see
how many bytes it took.

Hmm, it's a little abusive how VInt8.decode changes the offset of the
incoming BytesRef ... I guess this is why you want an upto :)

Net/net this is great progress over what we have today, so +1!

I ran a quick 10M English Wikipedia test w/ just term queries:
{noformat}
TaskQPS base  StdDevQPS comp  StdDevPct diff
   HighTerm   12.79  (2.4%)   12.56  (1.2%)   -1.8% 
(  -5% -1%)
MedTerm   18.04  (1.8%)   17.77  (0.8%)   -1.5% 
(  -4% -1%)
LowTerm   47.69  (1.1%)   47.56  (1.0%)   -0.3% 
(  -2% -1%)
{noformat}

The test only has 3 ords per doc so it's not typical ... looks like things 
got a bit slower (or possibly it's noise).

 Explore IntEncoder/Decoder bulk API
 ---

 Key: LUCENE-4620
 URL: https://issues.apache.org/jira/browse/LUCENE-4620
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-4620.patch, LUCENE-4620.patch


 Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
 and decode(int). Originally, we believed that this layer can be useful for 
 other scenarios, but in practice it's used only for writing/reading the 
 category ordinals from payload/DV.
 Therefore, Mike and I would like to explore a bulk API, something like 
 encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
 can still be streaming (as we don't know in advance how many ints will be 
 written), dunno. Will figure this out as we go.
 One thing to check is whether the bulk API can work w/ e.g. facet 
 associations, which can write arbitrary byte[], and so may decoding to an 
 IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
 out that associations will use a different bulk API.
 At the end of the day, the requirement is for someone to be able to configure 
 how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
 etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4295) SolrQuery setFacet*() and getFacet*() should have versions that specify the field

2013-01-10 Thread Colin Bartolome (JIRA)
Colin Bartolome created SOLR-4295:
-

 Summary: SolrQuery setFacet*() and getFacet*() should have 
versions that specify the field
 Key: SOLR-4295
 URL: https://issues.apache.org/jira/browse/SOLR-4295
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.0
Reporter: Colin Bartolome
Priority: Minor


Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery 
class should have methods that take a field parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4295) SolrQuery setFacet*() and getFacet*() should have versions that specify the field

2013-01-10 Thread Colin Bartolome (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Bartolome updated SOLR-4295:
--

Description: 
Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery 
class should have methods that take a field parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.

Also, as far as I can tell, there isn't a constant for the f. prefix. That 
would be helpful, too.

  was:
Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery 
class should have methods that take a field parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.


 SolrQuery setFacet*() and getFacet*() should have versions that specify the 
 field
 -

 Key: SOLR-4295
 URL: https://issues.apache.org/jira/browse/SOLR-4295
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.0
Reporter: Colin Bartolome
Priority: Minor

 Since the parameter names for field-specific faceting parameters are a little 
 odd (and undocumented), such as f.field_name.facet.prefix, the SolrQuery 
 class should have methods that take a field parameter. The 
 SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
 great if the rest of the setFacet*() and getFacet*() methods did, too.
 The workaround is trivial, albeit clumsy: just create the parameter names by 
 hand, as necessary.
 Also, as far as I can tell, there isn't a constant for the f. prefix. That 
 would be helpful, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549894#comment-13549894
 ] 

Steve Rowe commented on LUCENE-4431:


Can this be resolved now, since 3.6.2 was released?

 License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
 NOTICE.txt
 -

 Key: LUCENE-4431
 URL: https://issues.apache.org/jira/browse/LUCENE-4431
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/other
Affects Versions: 3.6.1, 4.0-BETA
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.0, 4.1, 5.0, 3.6.3

 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch


 - The demo module has sevlet-api.jar with a ASF-named license file and the 
 text TODO: fill in
 - This also affects Solr: It has a full ASF license file, but that is wrong.
 The servlet-apoi file is CDDL license: 
 http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
 for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549904#comment-13549904
 ] 

Robert Muir commented on LUCENE-4431:
-

No, because it wasnt fixed in 3.6.2

 License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
 NOTICE.txt
 -

 Key: LUCENE-4431
 URL: https://issues.apache.org/jira/browse/LUCENE-4431
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/other
Affects Versions: 3.6.1, 4.0-BETA
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.0, 4.1, 5.0, 3.6.3

 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch


 - The demo module has sevlet-api.jar with a ASF-named license file and the 
 text TODO: fill in
 - This also affects Solr: It has a full ASF license file, but that is wrong.
 The servlet-apoi file is CDDL license: 
 http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
 for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549914#comment-13549914
 ] 

Steve Rowe commented on LUCENE-4431:


ah right, fix version is 3.6.3

 License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
 NOTICE.txt
 -

 Key: LUCENE-4431
 URL: https://issues.apache.org/jira/browse/LUCENE-4431
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/other
Affects Versions: 3.6.1, 4.0-BETA
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.0, 4.1, 5.0, 3.6.3

 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch


 - The demo module has sevlet-api.jar with a ASF-named license file and the 
 text TODO: fill in
 - This also affects Solr: It has a full ASF license file, but that is wrong.
 The servlet-apoi file is CDDL license: 
 http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
 for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549930#comment-13549930
 ] 

Robert Muir commented on LUCENE-4431:
-

I did those automatically (when jira releases, it asks you if you want to move 
out any still-open issues... never saw it before, its handy though).

but  yeah we should still fix this if we do a 3.6.3 IMO

 License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
 NOTICE.txt
 -

 Key: LUCENE-4431
 URL: https://issues.apache.org/jira/browse/LUCENE-4431
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/other
Affects Versions: 3.6.1, 4.0-BETA
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.0, 4.1, 5.0, 3.6.3

 Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch


 - The demo module has sevlet-api.jar with a ASF-named license file and the 
 text TODO: fill in
 - This also affects Solr: It has a full ASF license file, but that is wrong.
 The servlet-apoi file is CDDL license: 
 http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
 for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4134:
---

Fix Version/s: 4.1

 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Steve Rowe
As of now, there are two Blocker issues in JIRA with Fix Version 4.1: 

Dataimporting with SolrCloud Fails
https://issues.apache.org/jira/browse/SOLR-4112

modify release process/scripts to use svn for rc/release publishing 
(svnpubsub)
https://issues.apache.org/jira/browse/LUCENE-4134

(LUCENE-4431 - servlet-api.jar licensing - is listed as Blocker with Fix 
Version including 4.1, but this has been fixed in branch_4x, and was reopened 
only for 3.6.X backporting.)  

LUCENE-4547 https://issues.apache.org/jira/browse/LUCENE-4547 (DocValues 2.0) 
is listed as Blocker with Fix Version including 4.2, but recent commits to 
branches/lucene4547/ include changes to the Lucene41 codec.  Looks like Fix 
Version should be changed to 4.1?

I'd like to release soon.  What else blocks this?

Steve

On Dec 31, 2012, at 2:08 PM, Mark Miller markrmil...@gmail.com wrote:

 I've started pushing on JIRA issue for a 4.1 release.
 
 If something is pushed that you are going to work on in the very near term, 
 please put it back.
 
 I'll progressively get more aggressive about pushing and count on committers 
 to fix any mistakes if they want something in 4.1.
 
 Remember, 4.2 can come shortly after 4.1.
 
 Next I will be pushing any 4.1 issues that have not been updated in a couple 
 months.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4620:
---

Attachment: LUCENE-4620.patch

bq. Can we use Collections.singletonMap when there are no partitions?

Done. Note though that BytesRef cannot be reused in the case of 
PerDimensionIndexingParams (i.e. multiple CLPs). This is not the common case, 
but it's not trivial to specialize it. Maybe as a second iteration. I did put a 
TODO in FacetFields to allow reuse.

bq. why do we have VInt8.bytesNeeded? Who uses that?

Currently no one uses it, but it was there and I thought that it's a convenient 
API to keep. Why encode and then see how many bytes were occupied?
Anyway, neither the encoders nor the decoders use it. I have no strong feelings 
for keeping/removing it, so if you feel like it should be removed, I can do it.

bq. Hmm, it's a little abusive how VInt8.decode changes the offset of the 
incoming BytesRef

It is, but that's the result of Java's lack of pass by reference. I.e., decode 
needs to return the caller two values: the decoded number and how many bytes 
were read.
Notice that in the previous byte[] variant, the method took a class Position, 
which is horrible. That's why I documented in decode() that it advances 
bytes.offset, so
the caller can restore it in the end. For instance, IntDecoder restores the 
offset to the original one in the end.

On LUCENE-4675 Robert gave me an idea to create a BytesRefIterator, and I 
started to play with it. I.e. it would wrap a BytesRef but add 'pos' and 'upto' 
indexes.
The user can modify 'pos' freely, withouth touching bytes.offset. That 
introduces an object allocation though, and since I'd want to reuse that object 
wherever
possible, I think I'll look at it after finishing this issue. It already 
contains too many changes.

bq. I guess this is why you want an upto

No, I wanted upto because iterating up to bytes.length is incorrect. You need 
to iterate up to offset+length. BytesRefIterator.pos and BytesRefIterator.upto 
solve these cases for me.

bq. looks like things got a bit slower (or possibly it's noise)

First, even if it's not noise, the slowdown IMO is worth the code 
simplification. But, I do believe that we'll see gains when there are more than 
3 integers to encode/decode.
In fact, the facets test package has an EncodingSpeed class which measures the 
time it takes to encode/decode a large number of integers (a few thousands). 
When I compared the
result to 4x (i.e. without the patch), the decode time seemed to be ~x5 faster.

In this patch I added an Ant task run-encoding-benchmark which runs this 
class. Want to give it a try on your beast machine? For 4x, you can just copy 
the target to lucene/facet/build.xml, I believe it will work without issues.

 Explore IntEncoder/Decoder bulk API
 ---

 Key: LUCENE-4620
 URL: https://issues.apache.org/jira/browse/LUCENE-4620
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch


 Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
 and decode(int). Originally, we believed that this layer can be useful for 
 other scenarios, but in practice it's used only for writing/reading the 
 category ordinals from payload/DV.
 Therefore, Mike and I would like to explore a bulk API, something like 
 encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
 can still be streaming (as we don't know in advance how many ints will be 
 written), dunno. Will figure this out as we go.
 One thing to check is whether the bulk API can work w/ e.g. facet 
 associations, which can write arbitrary byte[], and so may decoding to an 
 IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
 out that associations will use a different bulk API.
 At the end of the day, the requirement is for someone to be able to configure 
 how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
 etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-4286:
---

Priority: Blocker  (was: Major)

 Atomic Updates on multi-valued fields giving unexpected results
 ---

 Key: SOLR-4286
 URL: https://issues.apache.org/jira/browse/SOLR-4286
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: Windows 7 64-bit
Reporter: Abhinav Shah
Assignee: Shalin Shekhar Mangar
Priority: Blocker

 I am using apache-solr 4.0.
 I am trying to post the following document - 
 {code}
 curl http://irvis016:8983/solr/collection1/update?commit=true -H 
 Content-Type: text/xml --data-binary 'add commitWithin=5000doc 
 boost=1.0field name=accessionNumber update=set3165297/fieldfield 
 name=status update=setORDERED/fieldfield name=account.accountName 
 update=setUS LABS DEMO ACCOUNT/fieldfield 
 name=account.addresses.address1 update=set2601 Campus 
 Drive/fieldfield name=account.addresses.city 
 update=setIrvine/fieldfield name=account.addresses.state 
 update=setCA/fieldfield name=account.addresses.zip 
 update=set92622/fieldfield name=account.externalIds.sourceSystem 
 update=set10442/fieldfield name=orderingPhysician.lcProviderNumber 
 update=set60086/fieldfield name=patient.lpid 
 update=set5571351625769103/fieldfield 
 name=patient.patientName.lastName update=settest/fieldfield 
 name=patient.patientName.firstName update=settest123/fieldfield 
 name=patient.patientSSN update=set643522342/fieldfield 
 name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield 
 name=patient.mrNs.mrn update=set5423/fieldfield 
 name=specimens.specimenType update=setBone Marrow/fieldfield 
 name=specimens.specimenType update=setNerve tissue/fieldfield 
 name=UID3165297USLABS2012/field/doc/add'
 {code}
 This document gets successfully posted. However, the multi-valued field 
 'specimens.specimenType', gets stored as following in SOLR -
 {code}
 arr name=specimens.specimenType
 str{set=Bone Marrow}/str
 str{set=Nerve tissue}/str
 /arr
 {code}
 I did not expect {set= to be stored along with the text Bone Marror.
 My Solr schema xml definition for the field specimens.SpecimenType is - 
 {code}
 field indexed=true multiValued=true name=specimens.specimenType 
 omitNorms=false omitPositions=true omitTermFreqAndPositions=true 
 stored=true termVectors=false type=text_en/
 {code}
 Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update

2013-01-10 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-4294:
---

Priority: Blocker  (was: Minor)

 Solr 4 atomic update incorrect value when setting two or more values to a 
 multivalue via XML update
 ---

 Key: SOLR-4294
 URL: https://issues.apache.org/jira/browse/SOLR-4294
 Project: Solr
  Issue Type: Bug
  Components: clients - java, update
Affects Versions: 4.0
 Environment: RHEL
Reporter: Ben Pennell
Priority: Blocker
 Fix For: 4.0.1, 4.1


 Setting multiple values to a multivalued field via an XML atomic update 
 request is resulting in what appears to be the output of a toString() method. 
  See the examples below.
 I ran into this issue using the output for atomic updates from the fix for 
 Solr-4133 to ClientUtils.  The server being used is the base 4.0.0 release.
 {code}
 curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' 
 -d '
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=setone/field
 field name=status update=settwo/field
 /doc/add'
 {code}
 Yields the following in Solr:
 {code}
   arr name=statusstr{set=one}/strstr{set=two}/str/arr
 {code}
 Changing the second set to an add has the same effect.
   If I only set one value though, it works correctly:
 {code}
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=setone/field
 /doc/add
 {code}
   Yields:
 {code}
 arr name=statusstrone/str/arr
 {code}
   It also works fine if I split it into two operations
 {code}
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=setone/field
 /doc/add
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=addtwo/field
 /doc/add
 {code}
   Yields:
 {code}
 arr name=statusstrone/strstrtwo/str/arr
 {code}
   Oddly, it works fine as a singe request in JSON:
 {code}
 curl -k 'http://localhost/solr/update?commit=true' -H 
 'Content-type:application/json' -d '[id:test, {status:{set:[one, 
 two]}}]'
 {code}
   Yields:
 {code}
 arr name=statusstrone/strstrtwo/str/arr
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update

2013-01-10 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-4294:
---

Assignee: Shalin Shekhar Mangar

 Solr 4 atomic update incorrect value when setting two or more values to a 
 multivalue via XML update
 ---

 Key: SOLR-4294
 URL: https://issues.apache.org/jira/browse/SOLR-4294
 Project: Solr
  Issue Type: Bug
  Components: clients - java, update
Affects Versions: 4.0
 Environment: RHEL
Reporter: Ben Pennell
Assignee: Shalin Shekhar Mangar
Priority: Blocker
 Fix For: 4.0.1, 4.1


 Setting multiple values to a multivalued field via an XML atomic update 
 request is resulting in what appears to be the output of a toString() method. 
  See the examples below.
 I ran into this issue using the output for atomic updates from the fix for 
 Solr-4133 to ClientUtils.  The server being used is the base 4.0.0 release.
 {code}
 curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' 
 -d '
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=setone/field
 field name=status update=settwo/field
 /doc/add'
 {code}
 Yields the following in Solr:
 {code}
   arr name=statusstr{set=one}/strstr{set=two}/str/arr
 {code}
 Changing the second set to an add has the same effect.
   If I only set one value though, it works correctly:
 {code}
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=setone/field
 /doc/add
 {code}
   Yields:
 {code}
 arr name=statusstrone/str/arr
 {code}
   It also works fine if I split it into two operations
 {code}
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=setone/field
 /doc/add
 adddoc boost=1.0
 field name=idtest/field
 field name=status update=addtwo/field
 /doc/add
 {code}
   Yields:
 {code}
 arr name=statusstrone/strstrtwo/str/arr
 {code}
   Oddly, it works fine as a singe request in JSON:
 {code}
 curl -k 'http://localhost/solr/update?commit=true' -H 
 'Content-type:application/json' -d '[id:test, {status:{set:[one, 
 two]}}]'
 {code}
   Yields:
 {code}
 arr name=statusstrone/strstrtwo/str/arr
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Erik Hatcher
I set a couple of others to Blocker just now, which are related, probably dups. 
 Shalin is assigned to them both.

   Solr 4 atomic update incorrect value when setting two or more values to a 
multivalue via XML update
   https://issues.apache.org/jira/browse/SOLR-4294

and 

   Atomic Updates on multi-valued fields giving unexpected results
   https://issues.apache.org/jira/browse/SOLR-4286

Hopefully these aren't too bad and can make it in as well.

Erik

  

On Jan 10, 2013, at 14:12 , Steve Rowe wrote:

 As of now, there are two Blocker issues in JIRA with Fix Version 4.1: 
 
   Dataimporting with SolrCloud Fails
   https://issues.apache.org/jira/browse/SOLR-4112
 
   modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
   https://issues.apache.org/jira/browse/LUCENE-4134
 
 (LUCENE-4431 - servlet-api.jar licensing - is listed as Blocker with Fix 
 Version including 4.1, but this has been fixed in branch_4x, and was reopened 
 only for 3.6.X backporting.)  
 
 LUCENE-4547 https://issues.apache.org/jira/browse/LUCENE-4547 (DocValues 
 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits 
 to branches/lucene4547/ include changes to the Lucene41 codec.  Looks like 
 Fix Version should be changed to 4.1?
 
 I'd like to release soon.  What else blocks this?
 
 Steve
 
 On Dec 31, 2012, at 2:08 PM, Mark Miller markrmil...@gmail.com wrote:
 
 I've started pushing on JIRA issue for a 4.1 release.
 
 If something is pushed that you are going to work on in the very near term, 
 please put it back.
 
 I'll progressively get more aggressive about pushing and count on committers 
 to fix any mistakes if they want something in 4.1.
 
 Remember, 4.2 can come shortly after 4.1.
 
 Next I will be pushing any 4.1 issues that have not been updated in a couple 
 months.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549957#comment-13549957
 ] 

Yonik Seeley commented on SOLR-4286:


Hopefully this is already fixed.  Can you try a recent nightly build of 4x 
(soon to become 4.1)?
http://wiki.apache.org/solr/NightlyBuilds

 Atomic Updates on multi-valued fields giving unexpected results
 ---

 Key: SOLR-4286
 URL: https://issues.apache.org/jira/browse/SOLR-4286
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: Windows 7 64-bit
Reporter: Abhinav Shah
Assignee: Shalin Shekhar Mangar
Priority: Blocker

 I am using apache-solr 4.0.
 I am trying to post the following document - 
 {code}
 curl http://irvis016:8983/solr/collection1/update?commit=true -H 
 Content-Type: text/xml --data-binary 'add commitWithin=5000doc 
 boost=1.0field name=accessionNumber update=set3165297/fieldfield 
 name=status update=setORDERED/fieldfield name=account.accountName 
 update=setUS LABS DEMO ACCOUNT/fieldfield 
 name=account.addresses.address1 update=set2601 Campus 
 Drive/fieldfield name=account.addresses.city 
 update=setIrvine/fieldfield name=account.addresses.state 
 update=setCA/fieldfield name=account.addresses.zip 
 update=set92622/fieldfield name=account.externalIds.sourceSystem 
 update=set10442/fieldfield name=orderingPhysician.lcProviderNumber 
 update=set60086/fieldfield name=patient.lpid 
 update=set5571351625769103/fieldfield 
 name=patient.patientName.lastName update=settest/fieldfield 
 name=patient.patientName.firstName update=settest123/fieldfield 
 name=patient.patientSSN update=set643522342/fieldfield 
 name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield 
 name=patient.mrNs.mrn update=set5423/fieldfield 
 name=specimens.specimenType update=setBone Marrow/fieldfield 
 name=specimens.specimenType update=setNerve tissue/fieldfield 
 name=UID3165297USLABS2012/field/doc/add'
 {code}
 This document gets successfully posted. However, the multi-valued field 
 'specimens.specimenType', gets stored as following in SOLR -
 {code}
 arr name=specimens.specimenType
 str{set=Bone Marrow}/str
 str{set=Nerve tissue}/str
 /arr
 {code}
 I did not expect {set= to be stored along with the text Bone Marror.
 My Solr schema xml definition for the field specimens.SpecimenType is - 
 {code}
 field indexed=true multiValued=true name=specimens.specimenType 
 omitNorms=false omitPositions=true omitTermFreqAndPositions=true 
 stored=true termVectors=false type=text_en/
 {code}
 Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Robert Muir
On Thu, Jan 10, 2013 at 11:12 AM, Steve Rowe sar...@gmail.com wrote:

 LUCENE-4547 https://issues.apache.org/jira/browse/LUCENE-4547 (DocValues 
 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits 
 to branches/lucene4547/ include changes to the Lucene41 codec.  Looks like 
 Fix Version should be changed to 4.1?


This is a pretty bad bug (you cannot use docvalues with large
segments: I initially made it blocker for that reason), but I think we
are making good progress at a good pace.

My personal opinion: Its fine to just move it out to 4.2, I'd rather
have the time to get everything nice. A 4.1 would be an improvement on
its own, even if there are known problems like that.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Mark Miller

On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote:

 I'd like to release soon.  What else blocks this?

I think we should toss out a short term date (next tuesday?) for anyone to get 
in what they need for 4.1.

Then just consider blockers after branching?

Then release?

Objections, better ideas?

I think we should give a bit of time for people to finish up what's in flight 
or fix any blockers. Then we should heighten testing and allow for any new 
blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549961#comment-13549961
 ] 

Michael McCandless commented on LUCENE-4620:


{quote}
bq. Can we use Collections.singletonMap when there are no partitions?

Done. Note though that BytesRef cannot be reused in the case of 
PerDimensionIndexingParams (i.e. multiple CLPs). This is not the common case, 
but it's not trivial to specialize it. Maybe as a second iteration. I did put a 
TODO in FacetFields to allow reuse.
{quote}

Well, we'd somehow need N BytesRefs to reuse (one per CLP) ... but I
don't think we should worry about that now.

It is unfortunate that the common case is often held back by the full
flexibility/generality of the facet module ... sometimes I think we
need a facet-light module.  But maybe if we can get the specialization
done we don't need facet-light ...

{quote}
bq. why do we have VInt8.bytesNeeded? Who uses that?

Currently no one uses it, but it was there and I thought that it's a convenient 
API to keep. Why encode and then see how many bytes were occupied?
Anyway, neither the encoders nor the decoders use it. I have no strong feelings 
for keeping/removing it, so if you feel like it should be removed, I can do it.
{quote}

I think we should remove it: it's a dangerous API because it can
encourage consumers to do things like call bytesNeeded first (to know
how much to grow their buffer, say) followed by encoding.  The slow
part of vInt encoding is all those ifs ...

{quote}
bq. Hmm, it's a little abusive how VInt8.decode changes the offset of the 
incoming BytesRef

It is, but that's the result of Java's lack of pass by reference. I.e., decode 
needs to return the caller two values: the decoded number and how many bytes 
were read.
Notice that in the previous byte[] variant, the method took a class Position, 
which is horrible. That's why I documented in decode() that it advances 
bytes.offset, so
the caller can restore it in the end. For instance, IntDecoder restores the 
offset to the original one in the end.

On LUCENE-4675 Robert gave me an idea to create a BytesRefIterator, and I 
started to play with it. I.e. it would wrap a BytesRef but add 'pos' and 'upto' 
indexes.
The user can modify 'pos' freely, withouth touching bytes.offset. That 
introduces an object allocation though, and since I'd want to reuse that object 
wherever
possible, I think I'll look at it after finishing this issue. It already 
contains too many changes.
{quote}

OK.

{quote}
bq. I guess this is why you want an upto

No, I wanted upto because iterating up to bytes.length is incorrect. You need 
to iterate up to offset+length. BytesRefIterator.pos and BytesRefIterator.upto 
solve these cases for me.
{quote}

OK.

{quote}
bq. looks like things got a bit slower (or possibly it's noise)

First, even if it's not noise, the slowdown IMO is worth the code 
simplification.
{quote}

+1

{quote}
But, I do believe that we'll see gains when there are more than 3 integers to 
encode/decode.
In fact, the facets test package has an EncodingSpeed class which measures the 
time it takes to encode/decode a large number of integers (a few thousands). 
When I compared the
result to 4x (i.e. without the patch), the decode time seemed to be ~x5 faster.
{quote}

Good!  Would be nice to have a real-world biggish-number-of-facets
benchmark ... I'll ponder how to do that w/ luceneutil.

bq. In this patch I added an Ant task run-encoding-benchmark which runs this 
class. Want to give it a try on your beast machine? For 4x, you can just copy 
the target to lucene/facet/build.xml, I believe it will work without issues.

OK I'll run it!


 Explore IntEncoder/Decoder bulk API
 ---

 Key: LUCENE-4620
 URL: https://issues.apache.org/jira/browse/LUCENE-4620
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch


 Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
 and decode(int). Originally, we believed that this layer can be useful for 
 other scenarios, but in practice it's used only for writing/reading the 
 category ordinals from payload/DV.
 Therefore, Mike and I would like to explore a bulk API, something like 
 encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
 can still be streaming (as we don't know in advance how many ints will be 
 written), dunno. Will figure this out as we go.
 One thing to check is whether the bulk API can work w/ e.g. facet 
 associations, which can write arbitrary byte[], and so may decoding to an 
 IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
 out that associations will use a different bulk API.
 

Re: 4.1 release

2013-01-10 Thread Steve Rowe
I'd like to start sooner than next Tuesday.

I propose to make the branch tomorrow, and only allow Blocker issues to hold up 
the release after that.

A release candidate should then be possible by the middle of next week.

Steve

On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote:

 
 On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote:
 
 I'd like to release soon.  What else blocks this?
 
 I think we should toss out a short term date (next tuesday?) for anyone to 
 get in what they need for 4.1.
 
 Then just consider blockers after branching?
 
 Then release?
 
 Objections, better ideas?
 
 I think we should give a bit of time for people to finish up what's in flight 
 or fix any blockers. Then we should heighten testing and allow for any new 
 blockers, and then kick it out. If we need to do a 4.2 shortly after, so be 
 it.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #734: POMs out of sync

2013-01-10 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/734/

1 tests failed.
FAILED:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 should have just been set up to be inconsistent - but it's still 
consistent

Stack Trace:
java.lang.AssertionError: shard1 should have just been set up to be 
inconsistent - but it's still consistent
at 
__randomizedtesting.SeedInfo.seed([5A32B9FE8374BE51:DBD437E6F42BDE6D]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549975#comment-13549975
 ] 

Michael McCandless commented on LUCENE-4620:


Trunk:
{noformat}
 [java] Estimating ~1 Integers compression time by
 [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, 
length of: 2430) 41152 times.
 [java] 
 [java] EncoderBits/Int  Encode Time
Encode Time  Decode TimeDecode Time
 [java]   [milliseconds]
[microsecond / int]   [milliseconds][microsecond / int]
 [java] 
---
 [java] VInt8   18.4955 4430
44.3003 116211.6201
 [java] Sorting (Unique (VInt8))18.4955 4344
43.4403 110511.0501
 [java] Sorting (Unique (DGap (VInt8))) 8.5597 4481 
   44.8103  842 8.4201
 [java] Sorting (Unique (DGap (EightFlags (VInt8 4.9679 
463646.3603 1021
10.2101
 [java] Sorting (Unique (DGap (FourFlags (VInt8 4.8198  
   451545.1503 1001
10.0101
 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8) 4.5794  
   490449.0403 1056 
   10.5601
 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8) 4.5794  
   475147.5103 1035 
   10.3501
 [java] 
 [java] 
 [java] Estimating ~1 Integers compression time by
 [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, 
length of: 1489) 67159 times.
 [java] 
 [java] EncoderBits/Int  Encode Time
Encode Time  Decode TimeDecode Time
 [java]   [milliseconds]
[microsecond / int]   [milliseconds][microsecond / int]
 [java] 
---
 [java] VInt8   18.2673 1241
12.4100 112811.2800
 [java] Sorting (Unique (VInt8))18.2673 3488
34.8801  924 9.2400
 [java] Sorting (Unique (DGap (VInt8))) 8.9456 3061 
   30.6101  660 6.6000
 [java] Sorting (Unique (DGap (EightFlags (VInt8 5.7542 
369336.9301 1026
10.2600
 [java] Sorting (Unique (DGap (FourFlags (VInt8 5.5447  
   346234.6201  811 
8.1100
 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8) 5.3566  
   384638.4601 1018 
   10.1800
 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8) 5.3996  
   387938.7901 1025 
   10.2500
 [java] 
 [java] 
 [java] Estimating ~1 Integers compression time by
 [java] Encoding/decoding facets' ID payload of docID = 1 (unsorted, 
length of: 18) 555 times.
 [java] 
 [java] EncoderBits/Int  Encode Time
Encode Time  Decode TimeDecode Time
 [java]   [milliseconds]
[microsecond / int]   [milliseconds][microsecond / int]
 [java] 
---
 [java] VInt8   20.8889 1179
11.7900 111411.1400
 [java] Sorting (Unique (VInt8))20.8889 2251
22.5100 117111.7100
 [java] Sorting (Unique (DGap (VInt8)))12. 2174 
   21.7400  848 8.4800
 [java] Sorting (Unique (DGap (EightFlags (VInt810. 
237223.7200 1092

[jira] [Created] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-10 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4677:
--

 Summary: Use vInt to encode node addresses inside FST
 Key: LUCENE-4677
 URL: https://issues.apache.org/jira/browse/LUCENE-4677
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.2, 5.0


Today we use int, but towards enabling  2.1G sized FSTs, I'd like to make this 
vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4677:
--

Assignee: Michael McCandless

 Use vInt to encode node addresses inside FST
 

 Key: LUCENE-4677
 URL: https://issues.apache.org/jira/browse/LUCENE-4677
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0


 Today we use int, but towards enabling  2.1G sized FSTs, I'd like to make 
 this vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Abhinav Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550245#comment-13550245
 ] 

Abhinav Shah commented on SOLR-4286:


I tried on nightly build - apache-solr-4.1-2013-01-10_05-50-28.zip, and it 
works.

Thanks

 Atomic Updates on multi-valued fields giving unexpected results
 ---

 Key: SOLR-4286
 URL: https://issues.apache.org/jira/browse/SOLR-4286
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: Windows 7 64-bit
Reporter: Abhinav Shah
Assignee: Shalin Shekhar Mangar
Priority: Blocker

 I am using apache-solr 4.0.
 I am trying to post the following document - 
 {code}
 curl http://irvis016:8983/solr/collection1/update?commit=true -H 
 Content-Type: text/xml --data-binary 'add commitWithin=5000doc 
 boost=1.0field name=accessionNumber update=set3165297/fieldfield 
 name=status update=setORDERED/fieldfield name=account.accountName 
 update=setUS LABS DEMO ACCOUNT/fieldfield 
 name=account.addresses.address1 update=set2601 Campus 
 Drive/fieldfield name=account.addresses.city 
 update=setIrvine/fieldfield name=account.addresses.state 
 update=setCA/fieldfield name=account.addresses.zip 
 update=set92622/fieldfield name=account.externalIds.sourceSystem 
 update=set10442/fieldfield name=orderingPhysician.lcProviderNumber 
 update=set60086/fieldfield name=patient.lpid 
 update=set5571351625769103/fieldfield 
 name=patient.patientName.lastName update=settest/fieldfield 
 name=patient.patientName.firstName update=settest123/fieldfield 
 name=patient.patientSSN update=set643522342/fieldfield 
 name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield 
 name=patient.mrNs.mrn update=set5423/fieldfield 
 name=specimens.specimenType update=setBone Marrow/fieldfield 
 name=specimens.specimenType update=setNerve tissue/fieldfield 
 name=UID3165297USLABS2012/field/doc/add'
 {code}
 This document gets successfully posted. However, the multi-valued field 
 'specimens.specimenType', gets stored as following in SOLR -
 {code}
 arr name=specimens.specimenType
 str{set=Bone Marrow}/str
 str{set=Nerve tissue}/str
 /arr
 {code}
 I did not expect {set= to be stored along with the text Bone Marror.
 My Solr schema xml definition for the field specimens.SpecimenType is - 
 {code}
 field indexed=true multiValued=true name=specimens.specimenType 
 omitNorms=false omitPositions=true omitTermFreqAndPositions=true 
 stored=true termVectors=false type=text_en/
 {code}
 Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4677:
---

Attachment: LUCENE-4677.patch

Initial patch ... not committable until I add a back-compat layer somehow ... 
(how come TestBackCompat isn't failing...).

I tested Kuromoji's TokenInfo FST, temporarily turning off packing: vInt 
encoding made the non-packed FST ~12% smaller (good!).  The packed FST is 
unchanged in size.

Then I tested on a bigger FST (AnalyzingSuggester build of FreeDB's song 
titles) and the resulting FST is nearly the same size (1.0463 GB for trunk and 
1.0458 with patch).

 Use vInt to encode node addresses inside FST
 

 Key: LUCENE-4677
 URL: https://issues.apache.org/jira/browse/LUCENE-4677
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4677.patch


 Today we use int, but towards enabling  2.1G sized FSTs, I'd like to make 
 this vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-4286.


Resolution: Duplicate
  Assignee: (was: Shalin Shekhar Mangar)

 Atomic Updates on multi-valued fields giving unexpected results
 ---

 Key: SOLR-4286
 URL: https://issues.apache.org/jira/browse/SOLR-4286
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: Windows 7 64-bit
Reporter: Abhinav Shah
Priority: Blocker

 I am using apache-solr 4.0.
 I am trying to post the following document - 
 {code}
 curl http://irvis016:8983/solr/collection1/update?commit=true -H 
 Content-Type: text/xml --data-binary 'add commitWithin=5000doc 
 boost=1.0field name=accessionNumber update=set3165297/fieldfield 
 name=status update=setORDERED/fieldfield name=account.accountName 
 update=setUS LABS DEMO ACCOUNT/fieldfield 
 name=account.addresses.address1 update=set2601 Campus 
 Drive/fieldfield name=account.addresses.city 
 update=setIrvine/fieldfield name=account.addresses.state 
 update=setCA/fieldfield name=account.addresses.zip 
 update=set92622/fieldfield name=account.externalIds.sourceSystem 
 update=set10442/fieldfield name=orderingPhysician.lcProviderNumber 
 update=set60086/fieldfield name=patient.lpid 
 update=set5571351625769103/fieldfield 
 name=patient.patientName.lastName update=settest/fieldfield 
 name=patient.patientName.firstName update=settest123/fieldfield 
 name=patient.patientSSN update=set643522342/fieldfield 
 name=patient.patientDOB update=set1979-11-11T08:00:00.000Z/fieldfield 
 name=patient.mrNs.mrn update=set5423/fieldfield 
 name=specimens.specimenType update=setBone Marrow/fieldfield 
 name=specimens.specimenType update=setNerve tissue/fieldfield 
 name=UID3165297USLABS2012/field/doc/add'
 {code}
 This document gets successfully posted. However, the multi-valued field 
 'specimens.specimenType', gets stored as following in SOLR -
 {code}
 arr name=specimens.specimenType
 str{set=Bone Marrow}/str
 str{set=Nerve tissue}/str
 /arr
 {code}
 I did not expect {set= to be stored along with the text Bone Marror.
 My Solr schema xml definition for the field specimens.SpecimenType is - 
 {code}
 field indexed=true multiValued=true name=specimens.specimenType 
 omitNorms=false omitPositions=true omitTermFreqAndPositions=true 
 stored=true termVectors=false type=text_en/
 {code}
 Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550352#comment-13550352
 ] 

Steve Rowe commented on LUCENE-4134:


bq. [A]s part of this new process there will also be a 
https://dist.apache.org/repos/dist/dev/lucene; directory where release 
candidates can be put for review (instead of 
people.apache.org/~releasemanager/...), and if/when they are voted successfully 
a simple svn mv to dist/release/lucene makes them official and pushes them to 
the mirrors.

There is a wrinkle here: maven artifacts.  Our current process includes them 
with the ASF release artifacts at the RC review download link.  If we continue 
this when we instead commit RCs to 
{{repos/dist/dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/}}, then the release 
publishing process can't be just {{svn mv 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM release/lucene/{java,solr}/X.Y.Z}}.  
Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/maven}}.

An alternative: now that we stage maven artifacts to Nexus 
(repository.apache.org) prior to the release, we could as part of an RC 
announcement also include the Nexus link.  This option gets my +1.

 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Mark Miller
-1 from me - I don't like not giving people a target date to clean things up 
by. No one has given a proposed date to try and tie things up by - just calling 
'hike is tomorrow' out of nowhere doesn't seem right to me.

We have a lot of people working on this over a lot of timezones. I think we 
should do the right thing and give everyone at least a few days and a weekend 
to finish getting their issues into 4.1.

- Mark

On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote:

 I'd like to start sooner than next Tuesday.
 
 I propose to make the branch tomorrow, and only allow Blocker issues to hold 
 up the release after that.
 
 A release candidate should then be possible by the middle of next week.
 
 Steve
 
 On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote:
 
 I'd like to release soon.  What else blocks this?
 
 I think we should toss out a short term date (next tuesday?) for anyone to 
 get in what they need for 4.1.
 
 Then just consider blockers after branching?
 
 Then release?
 
 Objections, better ideas?
 
 I think we should give a bit of time for people to finish up what's in 
 flight or fix any blockers. Then we should heighten testing and allow for 
 any new blockers, and then kick it out. If we need to do a 4.2 shortly 
 after, so be it.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550352#comment-13550352
 ] 

Steve Rowe edited comment on LUCENE-4134 at 1/10/13 8:09 PM:
-

bq. [A]s part of this new process there will also be a 
https://dist.apache.org/repos/dist/dev/lucene; directory where release 
candidates can be put for review (instead of 
people.apache.org/~releasemanager/...), and if/when they are voted successfully 
a simple svn mv to dist/release/lucene makes them official and pushes them to 
the mirrors.

There is a wrinkle here: maven artifacts.  Our current process includes them 
with the ASF release artifacts at the RC review download link.  If we continue 
this when we instead commit RCs to 
{{repos/dist/dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM/}}, then the release 
publishing process can't be just {{svn mv 
dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM release/lucene/\{java,solr}/X.Y.Z}}.  
Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm 
dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM/maven}}.

An alternative: now that we stage maven artifacts to Nexus 
(repository.apache.org) prior to the release, we could as part of an RC 
announcement also include the Nexus link, and not include the maven artifacts 
in {{repos/dist/dev/lucene/}}.  This option gets my +1.

  was (Author: steve_rowe):
bq. [A]s part of this new process there will also be a 
https://dist.apache.org/repos/dist/dev/lucene; directory where release 
candidates can be put for review (instead of 
people.apache.org/~releasemanager/...), and if/when they are voted successfully 
a simple svn mv to dist/release/lucene makes them official and pushes them to 
the mirrors.

There is a wrinkle here: maven artifacts.  Our current process includes them 
with the ASF release artifacts at the RC review download link.  If we continue 
this when we instead commit RCs to 
{{repos/dist/dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/}}, then the release 
publishing process can't be just {{svn mv 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM release/lucene/{java,solr}/X.Y.Z}}.  
Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/maven}}.

An alternative: now that we stage maven artifacts to Nexus 
(repository.apache.org) prior to the release, we could as part of an RC 
announcement also include the Nexus link.  This option gets my +1.
  
 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4678:
--

 Summary: FST should use paged byte[] instead of single contiguous 
byte[]
 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0


The single byte[] we use today has several limitations, eg it limits us to  
2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
it causes big RAM spikes during building when a the array has to grow.

I took basically the same approach as LUCENE-3298, but I want to break out this 
patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4678:
---

Attachment: LUCENE-4678.patch

Patch, I think it's close to ready (no format change for the FST so no back 
compat).

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4678:
---

Attachment: LUCENE-4678.patch

Duh, wrong patch ... this one should be right.

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2013-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3298:
--

Assignee: Michael McCandless

 FST has hard limit max size of 2.1 GB
 -

 Key: LUCENE-3298
 URL: https://issues.apache.org/jira/browse/LUCENE-3298
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3298.patch


 The FST uses a single contiguous byte[] under the hood, which in java is 
 indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
 internally encodes references to this array as vInt.
 We could switch this to a paged byte[] and make the far larger.
 But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2013-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3298:
---

Attachment: LUCENE-3298.patch

Initial test to confirm FSTs can grow beyond 2GB (it fails today!).

 FST has hard limit max size of 2.1 GB
 -

 Key: LUCENE-3298
 URL: https://issues.apache.org/jira/browse/LUCENE-3298
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-3298.patch, LUCENE-3298.patch


 The FST uses a single contiguous byte[] under the hood, which in java is 
 indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
 internally encodes references to this array as vInt.
 We could switch this to a paged byte[] and make the far larger.
 But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550373#comment-13550373
 ] 

Robert Muir commented on LUCENE-4134:
-

Wouldn't another alternative instead just continue to use our p.a.o/~ versus 
deploying to two places?

I don't like having to check a release spread across two different places. 
And this would also make automatic 
verification difficult (today, we can pass the p.a.o link and it checks 
everything)


 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Steve Rowe
Okay - I can see your logic, Mark, but this is not even close to out of 
nowhere.  You yourself have been vocal about making a 4.1 release for a couple 
weeks now.

I agree with Robert Muir that we should be promoting short turnaround releases. 
 If it doesn't make this release, it'll make the next one, which will come out 
in a relatively short span of time.  In this model, Blocker issues are the 
drivers, not Fix Version.If people want stuff in the release, they should 
mark their issue as Blocker.

How about a compromise - next Monday we branch and only allow Blockers to block 
the release?

Steve

On Jan 10, 2013, at 3:08 PM, Mark Miller markrmil...@gmail.com wrote:

 -1 from me - I don't like not giving people a target date to clean things up 
 by. No one has given a proposed date to try and tie things up by - just 
 calling 'hike is tomorrow' out of nowhere doesn't seem right to me.
 
 We have a lot of people working on this over a lot of timezones. I think we 
 should do the right thing and give everyone at least a few days and a weekend 
 to finish getting their issues into 4.1.
 
 - Mark
 
 On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote:
 
 I'd like to start sooner than next Tuesday.
 
 I propose to make the branch tomorrow, and only allow Blocker issues to hold 
 up the release after that.
 
 A release candidate should then be possible by the middle of next week.
 
 Steve
 
 On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote:
 
 I'd like to release soon.  What else blocks this?
 
 I think we should toss out a short term date (next tuesday?) for anyone to 
 get in what they need for 4.1.
 
 Then just consider blockers after branching?
 
 Then release?
 
 Objections, better ideas?
 
 I think we should give a bit of time for people to finish up what's in 
 flight or fix any blockers. Then we should heighten testing and allow for 
 any new blockers, and then kick it out. If we need to do a 4.2 shortly 
 after, so be it.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4547) DocValues field broken on large indexes

2013-01-10 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4547:


Priority: Major  (was: Blocker)

 DocValues field broken on large indexes
 ---

 Key: LUCENE-4547
 URL: https://issues.apache.org/jira/browse/LUCENE-4547
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.2, 5.0

 Attachments: test.patch


 I tried to write a test to sanity check LUCENE-4536 (first running against 
 svn revision 1406416, before the change).
 But i found docvalues is already broken here for large indexes that have a 
 PackedLongDocValues field:
 {code}
 final int numDocs = 5;
 for (int i = 0; i  numDocs; ++i) {
   if (i == 0) {
 field.setLongValue(0L); // force  32bit deltas
   } else {
 field.setLongValue(133L); 
   }
   w.addDocument(doc);
 }
 w.forceMerge(1);
 w.close();
 dir.close(); // checkindex
 {code}
 {noformat}
 [junit4:junit4]   2 WARNING: Uncaught exception in thread: Thread[Lucene 
 Merge Thread #0,6,TGRP-Test2GBDocValues]
 [junit4:junit4]   2 org.apache.lucene.index.MergePolicy$MergeException: 
 java.lang.ArrayIndexOutOfBoundsException: -65536
 [junit4:junit4]   2  at 
 __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
 [junit4:junit4]   2  at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535)
 [junit4:junit4]   2  at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508)
 [junit4:junit4]   2 Caused by: java.lang.ArrayIndexOutOfBoundsException: 
 -65536
 [junit4:junit4]   2  at 
 org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305)
 [junit4:junit4]   2  at 
 org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115)
 [junit4:junit4]   2  at 
 org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109)
 [junit4:junit4]   2  at 
 org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80)
 [junit4:junit4]   2  at 
 org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130)
 [junit4:junit4]   2  at 
 org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550383#comment-13550383
 ] 

Hoss Man commented on LUCENE-4134:
--

bq. Wouldn't another alternative instead just continue to use our p.a.o/~ 
versus deploying to two places?

+1

I would suggest that for now we move forward with the simplest possible changes 
to our overall processes that satisfies infra: using the new svn repo for our 
final release dist, but leave everything else related to RCs, and smoke 
checking, as is.

Then we can discuss/iterate on other changes to the release process at our 
leasiure (ie: maybe we put hte RCs in svn, and tweak the directory structure so 
a simple svn mv works for the dist files, and we have some other script for 
the maven files)

 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Mark Miller
Saying tomorrow without any date that gives anyone any time to do anything is 
out of nowhere to me. People in Europe and east of that will wake up and find 
out, oh today. While pressure has been building towards a release, no one has 
proposed a date for a cutoff. I think that is always only fair. I think that if 
you were desperate to cut off to blockers tomorrow, you should have called for 
that last week.

Robert Muir's short term releases are not threatened by allowing people to plan 
and execute a release together. You can take that too far and do damage from 
the opposite direction. Giving people time to tie things up with a real 
deadline is only fair. We all know a nebulous deadline is not conducive to 
finishing up work.

I think all releases should have a known date that we agree on that gives 
developers some time to finish what they are working on or what they believe is 
important for the release. At a minimum there should be a few days for this. A 
weekend involved only seems fair. This doesn't have to be a long time, but it 
should not require we file blockers and just seems like a friendly way to 
develop together.

Monday is fine by me if others buy into it.

Otherwise, we have taken 4 or 5 months for 4.1. Let's not drag it out another 
month. But let's not do the reverse and release it tonight. The sensible 
approach always seems like we should plan out some target dates on the list - 
dates that actually give devs a chance to respond to - and then follow through 
on those dates.

- Mark

On Jan 10, 2013, at 3:26 PM, Steve Rowe sar...@gmail.com wrote:

 Okay - I can see your logic, Mark, but this is not even close to out of 
 nowhere.  You yourself have been vocal about making a 4.1 release for a 
 couple weeks now.
 
 I agree with Robert Muir that we should be promoting short turnaround 
 releases.  If it doesn't make this release, it'll make the next one, which 
 will come out in a relatively short span of time.  In this model, Blocker 
 issues are the drivers, not Fix Version.If people want stuff in the 
 release, they should mark their issue as Blocker.
 
 How about a compromise - next Monday we branch and only allow Blockers to 
 block the release?
 
 Steve
 
 On Jan 10, 2013, at 3:08 PM, Mark Miller markrmil...@gmail.com wrote:
 
 -1 from me - I don't like not giving people a target date to clean things up 
 by. No one has given a proposed date to try and tie things up by - just 
 calling 'hike is tomorrow' out of nowhere doesn't seem right to me.
 
 We have a lot of people working on this over a lot of timezones. I think we 
 should do the right thing and give everyone at least a few days and a 
 weekend to finish getting their issues into 4.1.
 
 - Mark
 
 On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote:
 
 I'd like to start sooner than next Tuesday.
 
 I propose to make the branch tomorrow, and only allow Blocker issues to 
 hold up the release after that.
 
 A release candidate should then be possible by the middle of next week.
 
 Steve
 
 On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote:
 
 I'd like to release soon.  What else blocks this?
 
 I think we should toss out a short term date (next tuesday?) for anyone to 
 get in what they need for 4.1.
 
 Then just consider blockers after branching?
 
 Then release?
 
 Objections, better ideas?
 
 I think we should give a bit of time for people to finish up what's in 
 flight or fix any blockers. Then we should heighten testing and allow for 
 any new blockers, and then kick it out. If we need to do a 4.2 shortly 
 after, so be it.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550399#comment-13550399
 ] 

Steve Rowe commented on LUCENE-4134:


bq. Wouldn't another alternative instead just continue to use our p.a.o/~ 
versus deploying to two places?

Yes, you're right: +1

bq. Then we can discuss/iterate on other changes to the release process at our 
leasiure (ie: maybe we put hte RCs in svn, and tweak the directory structure so 
a simple svn mv works for the dist files, and we have some other script for 
the maven files)

If the {{maven/}} directories weren't there, a simple svn mv would work - no 
other tweaking required.

What other script did you have in mind for the maven files?  Are you talking 
about the need to change the smoke tester if the maven artifacts are moved out 
of the RC?

 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550406#comment-13550406
 ] 

Robert Muir commented on LUCENE-4134:
-

personally i would prefer if we don't have a separate script for changing the 
maven files.

I'm not really sure what this tester is currently doing: but in my opinion if 
someone gets 
Lucene 4.1 i should know WTF they got, regardless of whether its from an FTP 
site or maven.

So if it doesnt exist now, at least in the future I'd like more logic 
cross-checking between 
the two things to ensure they are consistent with each other.

Its scary to me that different build systems are producing different artifacts 
and we don't
have this today. 

And i know the checking isn't good enough when i see basic shit like things not 
even named
the same way: SOLR-4287


 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3982:


Attachment: SOLR-3982.patch

Updated Patch incorporates SOLR-4151 (normally i tried to handle issues 
separately, but this time it's easier to combine them)

Additionally changed:
* Show Info-Area also for 'idle' status
* Make Auto-Refresh optional via Checkbox
* Requests are now JSON and no longer XML 
_(Excluding the Configuration which is only available in XML)_

 Admin UI: Various Dataimport Improvements
 -

 Key: SOLR-3982
 URL: https://issues.apache.org/jira/browse/SOLR-3982
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0
Reporter: Shawn Heisey
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.2, 5.0

 Attachments: SOLR-3982.patch, SOLR-3982.patch


 Started with Shawn's Request about a small refresh link, one change leads to 
 the next, which is the reason why i changed this issue towards a more common 
 one
 This Patch brings:
 * A Refresh Status Button
 * A Abort Import Button
 * Improved Status-Handling 
 _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
 and you switched the view while at least one was running)_
 * Additional Stats on Rows/Documents
 _(on-the-fly calculated X Docs/second)_
 * less buggy duration-to-readable-time conversion
 _(until now resulted in NaN's showing up on your Screen)_
 Original Description:
 {quote}The dataimport section under each core on the admin gui does not 
 provide a way to get the current import status.  I actually would like to see 
 it automatically pull the status as soon as you click on Dataimport ... I 
 have never seen an import status with a qtime above 1 millisecond.  A refresh 
 icon/link would be good to have as well.
 Additional note: the resulting URL in the address bar is a little odd:
 http://server:port/solr/#/corename/dataimport//dataimport{quote}
 Although i gave a short explanation on the URL looking a bit odd:
 The first dataimport is required for the UI to detect which section you're 
 browsing .. the second /dataimport (including the slash, yes) is coming 
 from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4151) DIH 'debug' mode missing from 4.x UI

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-4151.
-

   Resolution: Duplicate
Fix Version/s: 4.1
 Assignee: Stefan Matheis (steffkes)

Marking as 'Duplicate', not completely correct but imho better than a (stupid) 
'Fixed'

 DIH 'debug' mode missing from 4.x UI
 

 Key: SOLR-4151
 URL: https://issues.apache.org/jira/browse/SOLR-4151
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0
Reporter: Hoss Man
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.1


 The new Admin UI in trunk  4.x supports most of the DIH related 
 functionality but the debug options were not implemented.
 http://wiki.apache.org/solr/DataImportHandler#Interactive_Development_Mode

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer commented on LUCENE-3178:
-

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM:
--

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

  was (Author: gbow...@fastmail.co.uk):
{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is 

[jira] [Commented] (SOLR-3755) shard splitting

2013-01-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550477#comment-13550477
 ] 

Mark Miller commented on SOLR-3755:
---

This has a back compat break that we should address somehow or at least mention 
in changes - previously you could specify explicit shard ids and still get 
distributed updates - now if you do that, you won't get distrib updates as 
shards won't be assigned ranges.

 shard splitting
 ---

 Key: SOLR-3755
 URL: https://issues.apache.org/jira/browse/SOLR-3755
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Yonik Seeley
 Attachments: SOLR-3755.patch, SOLR-3755.patch


 We can currently easily add replicas to handle increases in query volume, but 
 we should also add a way to add additional shards dynamically by splitting 
 existing shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550488#comment-13550488
 ] 

Dawid Weiss commented on LUCENE-4678:
-

This looks very cool! I looked at the patch briefly but I need to apply it to 
make sense of the whole picture. :) 
{code}
+  while(skip  0) {
+buffer.writeByte((byte) 0);
+skip--;
+  }
{code}

this doesn't look particularly efficient but I didn't get the context where 
it's actually used from the patch so maybe it's all right.

 FST should use paged byte[] instead of single contiguous byte[]
 ---

 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4678.patch, LUCENE-4678.patch


 The single byte[] we use today has several limitations, eg it limits us to  
 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
 it causes big RAM spikes during building when a the array has to grow.
 I took basically the same approach as LUCENE-3298, but I want to break out 
 this patch separately from changing all int - long for  2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)
Roman Chyla created LUCENE-4679:
---

 Summary: LowercaseExpandedTermsQueryNodeProcessor changes regex 
queries
 Key: LUCENE-4679
 URL: https://issues.apache.org/jira/browse/LUCENE-4679
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Roman Chyla
Priority: Trivial


This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, \\W should stay uppercase, 
but it will be lowercased.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-4679:


Attachment: LUCENE-4679.patch

 LowercaseExpandedTermsQueryNodeProcessor changes regex queries
 --

 Key: LUCENE-4679
 URL: https://issues.apache.org/jira/browse/LUCENE-4679
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Roman Chyla
Priority: Trivial
 Attachments: LUCENE-4679.patch


 This is really a very silly request, but could the lowercase processor 
 'abstain' from changing regex queries? For example, \\W should stay 
 uppercase, but it will be lowercased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-4679:


Description: 
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, W should stay 
uppercase, but it will be lowercased.





  was:
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, \\W should stay uppercase, 
but it will be lowercased.






 LowercaseExpandedTermsQueryNodeProcessor changes regex queries
 --

 Key: LUCENE-4679
 URL: https://issues.apache.org/jira/browse/LUCENE-4679
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Roman Chyla
Priority: Trivial
 Attachments: LUCENE-4679.patch


 This is really a very silly request, but could the lowercase processor 
 'abstain' from changing regex queries? For example, W should stay 
 uppercase, but it will be lowercased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-4679:


Description: 
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, W should stay 
uppercase, but it is lowercased.





  was:
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, W should stay 
uppercase, but it will be lowercased.






 LowercaseExpandedTermsQueryNodeProcessor changes regex queries
 --

 Key: LUCENE-4679
 URL: https://issues.apache.org/jira/browse/LUCENE-4679
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Roman Chyla
Priority: Trivial
 Attachments: LUCENE-4679.patch


 This is really a very silly request, but could the lowercase processor 
 'abstain' from changing regex queries? For example, W should stay 
 uppercase, but it is lowercased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550538#comment-13550538
 ] 

Hoss Man commented on LUCENE-4134:
--

bq. What other script did you have in mind for the maven files?

I just ment whatever we currently do to push them to to where ever we push them 
once the VOTE is official -- if that's currently bundled up i na script that 
also scp's the files to people.apache.org:/dist, then lets only worry about 
changing the people.apache.org part to start committing to svn, and worry about 
switching to RCs in svn and how we upload to maven from there later.



 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3982:


Attachment: SOLR-3982.patch

After a quick chat with [~elyograg], we decided to show the animated spinner 
only if auto-refresh is activated, otherwise the user might be confused.

 Admin UI: Various Dataimport Improvements
 -

 Key: SOLR-3982
 URL: https://issues.apache.org/jira/browse/SOLR-3982
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0
Reporter: Shawn Heisey
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.2, 5.0

 Attachments: SOLR-3982.patch, SOLR-3982.patch, SOLR-3982.patch


 Started with Shawn's Request about a small refresh link, one change leads to 
 the next, which is the reason why i changed this issue towards a more common 
 one
 This Patch brings:
 * A Refresh Status Button
 * A Abort Import Button
 * Improved Status-Handling 
 _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
 and you switched the view while at least one was running)_
 * Additional Stats on Rows/Documents
 _(on-the-fly calculated X Docs/second)_
 * less buggy duration-to-readable-time conversion
 _(until now resulted in NaN's showing up on your Screen)_
 Original Description:
 {quote}The dataimport section under each core on the admin gui does not 
 provide a way to get the current import status.  I actually would like to see 
 it automatically pull the status as soon as you click on Dataimport ... I 
 have never seen an import status with a qtime above 1 millisecond.  A refresh 
 icon/link would be good to have as well.
 Additional note: the resulting URL in the address bar is a little odd:
 http://server:port/solr/#/corename/dataimport//dataimport{quote}
 Although i gave a short explanation on the URL looking a bit odd:
 The first dataimport is required for the UI to detect which section you're 
 browsing .. the second /dataimport (including the slash, yes) is coming 
 from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 4.1 release

2013-01-10 Thread Jack Krupansky
The window of Monday through Wednesday sounds like a great target. Nothing 
says that the first RC has to be final. If whoever is doing the branch wants 
to do it on Monday rather than Tuesday, fine. If one or more of these nasty 
blockers gets fixed on Tuesday, we should still be open to a re-spin to 
put quality over a mere day or two of delay. But draw a hard line on 
Wednesday.


-- Jack Krupansky

-Original Message- 
From: Mark Miller

Sent: Thursday, January 10, 2013 3:36 PM
To: dev@lucene.apache.org
Subject: Re: 4.1 release

Saying tomorrow without any date that gives anyone any time to do anything 
is out of nowhere to me. People in Europe and east of that will wake up and 
find out, oh today. While pressure has been building towards a release, no 
one has proposed a date for a cutoff. I think that is always only fair. I 
think that if you were desperate to cut off to blockers tomorrow, you should 
have called for that last week.


Robert Muir's short term releases are not threatened by allowing people to 
plan and execute a release together. You can take that too far and do damage 
from the opposite direction. Giving people time to tie things up with a real 
deadline is only fair. We all know a nebulous deadline is not conducive to 
finishing up work.


I think all releases should have a known date that we agree on that gives 
developers some time to finish what they are working on or what they believe 
is important for the release. At a minimum there should be a few days for 
this. A weekend involved only seems fair. This doesn't have to be a long 
time, but it should not require we file blockers and just seems like a 
friendly way to develop together.


Monday is fine by me if others buy into it.

Otherwise, we have taken 4 or 5 months for 4.1. Let's not drag it out 
another month. But let's not do the reverse and release it tonight. The 
sensible approach always seems like we should plan out some target dates on 
the list - dates that actually give devs a chance to respond to - and then 
follow through on those dates.


- Mark

On Jan 10, 2013, at 3:26 PM, Steve Rowe sar...@gmail.com wrote:

Okay - I can see your logic, Mark, but this is not even close to out of 
nowhere.  You yourself have been vocal about making a 4.1 release for a 
couple weeks now.


I agree with Robert Muir that we should be promoting short turnaround 
releases.  If it doesn't make this release, it'll make the next one, which 
will come out in a relatively short span of time.  In this model, Blocker 
issues are the drivers, not Fix Version.If people want stuff in the 
release, they should mark their issue as Blocker.


How about a compromise - next Monday we branch and only allow Blockers to 
block the release?


Steve

On Jan 10, 2013, at 3:08 PM, Mark Miller markrmil...@gmail.com wrote:

-1 from me - I don't like not giving people a target date to clean things 
up by. No one has given a proposed date to try and tie things up by - 
just calling 'hike is tomorrow' out of nowhere doesn't seem right to me.


We have a lot of people working on this over a lot of timezones. I think 
we should do the right thing and give everyone at least a few days and a 
weekend to finish getting their issues into 4.1.


- Mark

On Jan 10, 2013, at 2:36 PM, Steve Rowe sar...@gmail.com wrote:


I'd like to start sooner than next Tuesday.

I propose to make the branch tomorrow, and only allow Blocker issues to 
hold up the release after that.


A release candidate should then be possible by the middle of next week.

Steve

On Jan 10, 2013, at 2:27 PM, Mark Miller markrmil...@gmail.com wrote:



On Jan 10, 2013, at 2:12 PM, Steve Rowe sar...@gmail.com wrote:


I'd like to release soon.  What else blocks this?


I think we should toss out a short term date (next tuesday?) for anyone 
to get in what they need for 4.1.


Then just consider blockers after branching?

Then release?

Objections, better ideas?

I think we should give a bit of time for people to finish up what's in 
flight or fix any blockers. Then we should heighten testing and allow 
for any new blockers, and then kick it out. If we need to do a 4.2 
shortly after, so be it.


- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550551#comment-13550551
 ] 

Steve Rowe commented on LUCENE-4134:


bq. personally i would prefer if we don't have a separate script for changing 
the maven files.
I'm not really sure what this tester is currently doing.

s/changing/checking/ ?

Here's what the maven artifact checking portion of the smoke tester currently 
does:

# Downloads the POM templates from the branch tag in Subversion (for later 
checking that all checked-in POM templates have corresponding artifacts)
# Downloads all the files under the {{maven/}} directories at the RC location
# Verifies that there is a deployed POM for each binary jar/war
# Verifies there is a binary jar for each POM template
# Verifies that the md5/sha1 digests for each Maven jar/war exist and are 
correct
# Verifies there is a source and javadocs jar for each binary jar
# Verifies that each deployed POM's artifactId/groupId (pulled from the POM) 
matches the POM's dir+filename
# Verifies that there is the binary jar for each deployed POM
# Downloads and unpacks the official distributions, and also unpacks the Solr 
war
# Verifies that the Maven binary artifacts have same-named files (after adding 
apache- to the Maven Solr jars/war)

These are a couple of additional steps in there to handle non-Mavenized 
dependencies, which we don't have any of anymore; these steps could be removed. 
  

bq. Its scary to me that different build systems are producing different 
artifacts

*All* the Maven artifacts are produced by Ant, not by Maven and not by 
maven-ant-tasks. 

bq. And i know the checking isn't good enough when i see basic shit like things 
not even named
the same way: SOLR-4287

maven-ant-tasks renames the Solr artifacts based on the Maven jar naming 
convention: artifactId-version(-type).jar - groupId org.apache.solr is not 
included.  This has been the Solr Maven artifact naming scheme since Solr 
artifacts started being published on the Maven central repository (v1.3).  
Using the Solr naming convention would result in the coordinates 
{{org.apache.solr.apache-solr.*}}, or maybe even 
{{org.apache.apache-solr:apache-solr.*}}, both of which look goofy to me.

I *think* Maven can technically handle artifact naming schemes that differ from 
artifactId-version(-type).jar, but I've never done that before, and I 
personally don't think it's worth the effort, especially given the IMHO goofy 
result.  Before SOLR-4287, I haven't seen anybody complain.  (If you look at 
SOLR-4287, by the way, the suggestion isn't to change Maven naming, it's to 
change the official Solr artifact naming.)  

 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550551#comment-13550551
 ] 

Steve Rowe edited comment on LUCENE-4134 at 1/10/13 11:06 PM:
--

bq. personally i would prefer if we don't have a separate script for changing 
the maven files.  I'm not really sure what this tester is currently doing.

s/changing/checking/ ?

Here's what the maven artifact checking portion of the smoke tester currently 
does:

# Downloads the POM templates from the branch tag in Subversion (for later 
checking that all checked-in POM templates have corresponding artifacts)
# Downloads all the files under the {{maven/}} directories at the RC location
# Verifies that there is a deployed POM for each binary jar/war
# Verifies there is a binary jar for each POM template
# Verifies that the md5/sha1 digests for each Maven jar/war exist and are 
correct
# Verifies there is a source and javadocs jar for each binary jar
# Verifies that each deployed POM's artifactId/groupId (pulled from the POM) 
matches the POM's dir+filename
# Verifies that there is the binary jar for each deployed POM
# Downloads and unpacks the official distributions, and also unpacks the Solr 
war
# Verifies that the Maven binary artifacts have same-named files (after adding 
apache- to the Maven Solr jars/war)

These are a couple of additional steps in there to handle non-Mavenized 
dependencies, which we don't have any of anymore; these steps could be removed. 
  

bq. Its scary to me that different build systems are producing different 
artifacts

*All* the Maven artifacts are produced by Ant, not by Maven and not by 
maven-ant-tasks. 

bq. And i know the checking isn't good enough when i see basic shit like things 
not even named the same way: SOLR-4287

maven-ant-tasks renames the Solr artifacts based on the Maven jar naming 
convention: artifactId-version(-type).jar - groupId org.apache.solr is not 
included.  This has been the Solr Maven artifact naming scheme since Solr 
artifacts started being published on the Maven central repository (v1.3).  
Using the Solr naming convention would result in the coordinates 
{{org.apache.solr.apache-solr.\*}}, or maybe even 
{{org.apache.apache-solr:apache-solr.\*}}, both of which look goofy to me.

I *think* Maven can technically handle artifact naming schemes that differ from 
artifactId-version(-type).jar, but I've never done that before, and I 
personally don't think it's worth the effort, especially given the IMHO goofy 
result.  Before SOLR-4287, I haven't seen anybody complain.  (If you look at 
SOLR-4287, by the way, the suggestion isn't to change Maven naming, it's to 
change the official Solr artifact naming.)  

  was (Author: steve_rowe):
bq. personally i would prefer if we don't have a separate script for 
changing the maven files.
I'm not really sure what this tester is currently doing.

s/changing/checking/ ?

Here's what the maven artifact checking portion of the smoke tester currently 
does:

# Downloads the POM templates from the branch tag in Subversion (for later 
checking that all checked-in POM templates have corresponding artifacts)
# Downloads all the files under the {{maven/}} directories at the RC location
# Verifies that there is a deployed POM for each binary jar/war
# Verifies there is a binary jar for each POM template
# Verifies that the md5/sha1 digests for each Maven jar/war exist and are 
correct
# Verifies there is a source and javadocs jar for each binary jar
# Verifies that each deployed POM's artifactId/groupId (pulled from the POM) 
matches the POM's dir+filename
# Verifies that there is the binary jar for each deployed POM
# Downloads and unpacks the official distributions, and also unpacks the Solr 
war
# Verifies that the Maven binary artifacts have same-named files (after adding 
apache- to the Maven Solr jars/war)

These are a couple of additional steps in there to handle non-Mavenized 
dependencies, which we don't have any of anymore; these steps could be removed. 
  

bq. Its scary to me that different build systems are producing different 
artifacts

*All* the Maven artifacts are produced by Ant, not by Maven and not by 
maven-ant-tasks. 

bq. And i know the checking isn't good enough when i see basic shit like things 
not even named
the same way: SOLR-4287

maven-ant-tasks renames the Solr artifacts based on the Maven jar naming 
convention: artifactId-version(-type).jar - groupId org.apache.solr is not 
included.  This has been the Solr Maven artifact naming scheme since Solr 
artifacts started being published on the Maven central repository (v1.3).  
Using the Solr naming convention would result in the coordinates 
{{org.apache.solr.apache-solr.*}}, or maybe even 
{{org.apache.apache-solr:apache-solr.*}}, both of which look goofy to me.

I *think* Maven can technically handle artifact naming 

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 11:08 PM:
---

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all assuming that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

  was (Author: gbow...@fastmail.co.uk):
{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains 

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550555#comment-13550555
 ] 

Steve Rowe commented on LUCENE-4134:


{quote}
bq. What other script did you have in mind for the maven files?

I just ment whatever we currently do to push them to to where ever we push them 
once the VOTE is official – if that's currently bundled up i na script that 
also scp's the files to people.apache.org:/dist, then lets only worry about 
changing the people.apache.org part to start committing to svn, and worry about 
switching to RCs in svn and how we upload to maven from there later.
{quote}

The process is here: 
[http://wiki.apache.org/lucene-java/PublishMavenArtifacts].  It's a two step 
process: first an Ant task stages the artifacts to the Nexus repository at 
{{repository.apache.org}}.  Then when the VOTE succeeds, the RM clicks a button 
on the Nexus web interface to publish them, and a few hours later they get 
synch'd to the Maven central repository.


 modify release process/scripts to use svn for rc/release publishing 
 (svnpubsub)
 ---

 Key: LUCENE-4134
 URL: https://issues.apache.org/jira/browse/LUCENE-4134
 Project: Lucene - Core
  Issue Type: Task
Reporter: Hoss Man
Priority: Blocker
 Fix For: 4.1


 By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
 entirely managed using svnpubsub ... our use of the Apache CMS for 
 lucene.apache.org puts us in compliance for our main website, but the dist 
 dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >