Re: ivy.
Thanks guys. I'll try it out later and let you know how it works. Dawid On Fri, Mar 30, 2012 at 11:50 PM, Uwe Schindler u...@thetaphi.de wrote: It can also go into ivysettings.xml (see example: http://ant.apache.org/ivy/history/latest-milestone/settings.html) and you can pass via properties where this file is, if not in classpath). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Greg Bowyer [mailto:gbow...@fastmail.co.uk] Sent: Friday, March 30, 2012 11:43 PM To: dev@lucene.apache.org Subject: Re: ivy. I am pretty sure this needs to go into the ivy.xml in lucene proper On 30/03/12 13:46, Dawid Weiss wrote: Ah, that's what I was looking for. But where do I put it? Can we set it globally in Lucene so that others (who have .m2 repos) can make immediate use of their preloaded artefacts? D. On Fri, Mar 30, 2012 at 10:44 PM, Greg Bowyergbow...@fastmail.co.uk wrote: You can get ivy to treat the local maven repo as a resolver host I think the required config is along the lines of % resolvers filesystem name=local-maven-2 m2compatible=true force=false local=true artifact pattern=${gerald.repo.dir}/[organisation]/[module]/[revision]/[modul e]-[revision].[ext]/ ivy pattern=${gerald.repo.dir}/[organisation]/[module]/[revision]/[modul e]-[revision].pom/ /filesystem /resolvers ... /settings chain name=whatever dual=true checkmodified=true changingPattern=.*SNAPSHOT resolver ref=local-maven-2/ resolver ref=apache-snapshot/ resolver ref=maven2/ ... /chain % -- Greg On 30/03/12 13:27, Dawid Weiss wrote: But honestly, i have no idea how ivy works. its just like ant to me. i just hack and hack and hack until it works. You're a live randomized solver! Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3940) When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole
[ https://issues.apache.org/jira/browse/LUCENE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3940: --- Attachment: LUCENE-3940.patch New patch, fixing a bug in the last one, and adding a few more test cases. I also made the print curious string on exception from BTSTC more ascii-friendly. I think it's ready. When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole - Key: LUCENE-3940 URL: https://issues.apache.org/jira/browse/LUCENE-3940 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-3940.patch, LUCENE-3940.patch I modified BaseTokenStreamTestCase to assert that the start/end offsets match for graph (posLen 1) tokens, and this caught a bug in Kuromoji when the decompounding of a compound token has a punctuation token that's dropped. In this case we should leave hole(s) so that the graph is intact, ie, the graph should look the same as if the punctuation tokens were not initially removed, but then a StopFilter had removed them. This also affects tokens that have no compound over them, ie we fail to leave a hole today when we remove the punctuation tokens. I'm not sure this is serious enough to warrant fixing in 3.6 at the last minute... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3932) Improve load time of .tii files
[ https://issues.apache.org/jira/browse/LUCENE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243102#comment-13243102 ] Michael McCandless commented on LUCENE-3932: bq. Is the space savings of delta encoding worth the processing time? You could write the .tii file to disk such that on open you could read it straight into a byte[]. This is actually what we do in 4.0's default codec (the index is an FST). It is tempting to do that in 3.x (if we were to do another 3.x release after 3.6) ... we'd need to alter other things as well, eg the term bytes are also delta-coded in the file but not in RAM. I'm curious how much larger it'd be if we stopped delta coding... for your case, how large is the byte[] in RAM (just call dataPagedBytes.getPointer(), just before we freeze it, and print that result) vs the tii on disk...? Improve load time of .tii files --- Key: LUCENE-3932 URL: https://issues.apache.org/jira/browse/LUCENE-3932 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.5 Environment: Linux Reporter: Sean Bridges Attachments: LUCENE-3932.trunk.patch, perf.csv We have a large 50 gig index which is optimized as one segment, with a 66 MEG .tii file. This index has no norms, and no field cache. It takes about 5 seconds to load this index, profiling reveals that 60% of the time is spent in GrowableWriter.set(index, value), and most of time in set(...) is spent resizing PackedInts.Mutatable current. In the constructor for TermInfosReaderIndex, you initialize the writer with the line, {quote}GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, false);{quote} For our index using four as the bit estimate results in 27 resizes. The last value in indexToTerms is going to be ~ tiiFileLength, and if instead you use, {quote}int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / Math.log10(2)); GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, false);{quote} Load time improves to ~ 2 seconds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: conditional High Freq Terms in Lucene index
One big problem is your collector (that gathers all A doc IDs) is not mapping the per-segment docID to the top-level global docID space. You need to save the docBase that was passed to setNextReader, and then add it back in on each collect call. Mike McCandless http://blog.mikemccandless.com On Fri, Mar 30, 2012 at 7:23 PM, starz10de farag_ah...@yahoo.com wrote: Thanks for your hint. I tried simple solution as following: Firstly I determine the document type “A” and stored them in an array by searching the field document type in the index: public static void doStreamingSearch(final Searcher searcher, Query query) throws IOException { Collector streamingHitCollector = new Collector() { // simply print docId and score of every matching document @Override public void collect(int doc) throws IOException { c++; // System.out.println(doc= + doc); doc_id.add(doc+); // System.out.println(doc= + doc ); // scorer.score()); } @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void setNextReader(IndexReader arg0, int arg1) throws IOException { // TODO Auto-generated method stub } @Override public void setScorer(Scorer arg0) throws IOException { // TODO Auto-generated method stub } }; searcher.search(query, streamingHitCollector); } Then I modified the HighFrequentTerm in lucene as follows: while (terms.next()) { dok.seek(terms); while (dok.next()) { for(int i=0;i doc_id.size();++i) { if( doc_id.get(i).equals(dok.doc()+)) { if (terms.term().field().equals(field) ) { tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq())); } } I could test that i correctly have only the document type „A“. However, the result is not correct because I can see few terms twice in the ordered high frequent list. Any hints where are the problem? Michael McCandless-2 wrote You'd have to modify HighFreqTerm's sources... Roughly... First, make a bitset recording which docs are type A (eg, use FieldCache), second, change HighFreqTerms so that for each term, it walks the postings, counting how many type A docs there were, then... just use the rest of HighFreqTerms (priority queue, etc.). Mike McCandless http://blog.mikemccandless.com On Thu, Mar 29, 2012 at 11:33 AM, starz10de farag_ahmed@ wrote: HI, I am using HighFreqTerms class to compute the high frequent terms in the Lucene index and it works well. However, I am interested to compute the high frequent terms under some condition. I would like to compute the high frequent terms not for all documents in the index instead only for documents with type “A”. Beside the “contents” field in the index I have also the “DocType” (document type) in the index as extra field. So I should compute the high frequent term only (if DocType=”A”) Any idea how to do this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3868066.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscribe@.apache For additional commands, e-mail: dev-help@.apache - To unsubscribe, e-mail: dev-unsubscribe@.apache For additional commands, e-mail: dev-help@.apache Michael McCandless-2 wrote You'd have to modify HighFreqTerm's sources... Roughly... First, make a bitset recording which docs are type A (eg, use FieldCache), second, change HighFreqTerms so that for each term, it walks the postings, counting how many type A docs there were, then... just use the rest of HighFreqTerms (priority queue, etc.). Mike McCandless http://blog.mikemccandless.com On Thu, Mar 29, 2012 at 11:33 AM, starz10de farag_ahmed@ wrote: HI, I am using HighFreqTerms class to compute the high frequent terms in the Lucene index and it works well. However, I am interested to compute the high frequent terms under some condition. I would like to compute the high frequent terms not for all documents in
[jira] [Commented] (LUCENE-3939) ClassCastException thrown in the map(String,int,TermVectorOffsetInfo[],int[]) method in org.apache.lucene.index.SortedTermVectorMapper
[ https://issues.apache.org/jira/browse/LUCENE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243108#comment-13243108 ] Michael McCandless commented on LUCENE-3939: I'm confused on how something's that not a TermVectorEntry can get into the termToTVE map... can you post a small test case showing this problem? ClassCastException thrown in the map(String,int,TermVectorOffsetInfo[],int[]) method in org.apache.lucene.index.SortedTermVectorMapper -- Key: LUCENE-3939 URL: https://issues.apache.org/jira/browse/LUCENE-3939 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.0.2, 3.1, 3.4, 3.5 Reporter: SHIN HWEI TAN Original Estimate: 0.05h Remaining Estimate: 0.05h The method map in the SortedTermVectorMapper class does not check the parameter term for the valid values. It throws ClassCastException when called with a invalid string for the parameter term (i.e., var3.map(*, (-1), null, null)). The exception thrown is due to an explict cast(i.e., casting the return value of termToTVE.get(term) to type TermVectorEntry). Suggested Fixes: Replace the beginning of the method body for the class SortedTermVectorMapper by changing it like this: public void map(String term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions) { if(termToTVE.get(term) instanceof TermVectorEntry){ TermVectorEntry entry = (TermVectorEntry) termToTVE.get(term); ... } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3738: -- Attachment: LUCENE-3738-improvement.patch After looking a while on the code, I have a further minor improvement. The most common case (int 128) now exits directly after reading the byte without any or variable assignment operations. Mike: Can you look at it and maybe do a quick test? I would like to commit this this evening to both branches. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3738: -- Attachment: (was: LUCENE-3738.patch) Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-3738: --- Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243125#comment-13243125 ] Michael McCandless commented on LUCENE-3738: Thanks Uwe, I'll test! Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3940) When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole
[ https://issues.apache.org/jira/browse/LUCENE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243126#comment-13243126 ] Robert Muir commented on LUCENE-3940: - I dont think we should do this. StandardTokenizer doesnt leave holes when it drops punctuation, I think holes should only be real 'words' for the most part When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole - Key: LUCENE-3940 URL: https://issues.apache.org/jira/browse/LUCENE-3940 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-3940.patch, LUCENE-3940.patch I modified BaseTokenStreamTestCase to assert that the start/end offsets match for graph (posLen 1) tokens, and this caught a bug in Kuromoji when the decompounding of a compound token has a punctuation token that's dropped. In this case we should leave hole(s) so that the graph is intact, ie, the graph should look the same as if the punctuation tokens were not initially removed, but then a StopFilter had removed them. This also affects tokens that have no compound over them, ie we fail to leave a hole today when we remove the punctuation tokens. I'm not sure this is serious enough to warrant fixing in 3.6 at the last minute... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1811 - Still Failing
Build: https://builds.apache.org/job/Solr-trunk/1811/ 1 tests failed. FAILED: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: Uncaught exception by thread: Thread[Thread-733,5,] Stack Trace: org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread: Uncaught exception by thread: Thread[Thread-733,5,] at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:84) at org.apache.lucene.util.LuceneTestCase$SaveThreadAndTestNameRule$1.evaluate(LuceneTestCase.java:642) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:63) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:38) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) Caused by: java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:32132/solr at org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:396) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:32132/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:361) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:209) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:312) at org.apache.solr.TestDistributedSearch$1.run(TestDistributedSearch.java:391) Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to localhost:32132 timed out at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:125) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:304) ... 4 more Build Log (for compile errors): [...truncated 10479 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: conditional High Freq Terms in Lucene index
I revised it including your comment: private Scorer scorer; private int docBase; // simply print docId and score of every matching document @Override public void collect(int doc) throws IOException { String k=doc+; String k1=docBase+; doc_ids.add(k+k1); } @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void setNextReader(IndexReader reader, int docBase) throws IOException { this.docBase = docBase; } @Override public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; } I could see in the highFrequentTerm that the condition for the document type A is performed. However, the highFrequent term isnot computed correctly, I still see duplicate term in the list beside wrong occuerence. here how I do it: TermInfoQueue tiq = new TermInfoQueue(numTerms); TermEnum terms = reader.terms(); TermDocs dok =null; int k=0; dok = reader.termDocs(); if (field != null) { while (terms.next()) { k=0; dok.seek(terms); while (dok.next()) { //System.out.println(dok.doc()); for(int i=0;i doc_ids.size();++i) { if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+)) { // here I can see that only doc ids for the type A is printed System.out.println(dok.doc()); if (terms.term().field().equals(field) ) { tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq())); } i=1; } } . . . any hint ? -- View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243138#comment-13243138 ] Michael McCandless commented on LUCENE-3738: Alas, the results are now all over the place! And I went back to the prior patch and tried to reproduce the above results... and the results are still all over the place. I think we are chasing Java ghosts at this point... Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: conditional High Freq Terms in Lucene index
Hmm, you are adding two strings. You should first add the two ints (docBase + doc), then convert that to a string. Mike McCandless http://blog.mikemccandless.com On Sat, Mar 31, 2012 at 8:56 AM, starz10de farag_ah...@yahoo.com wrote: I revised it including your comment: private Scorer scorer; private int docBase; // simply print docId and score of every matching document @Override public void collect(int doc) throws IOException { String k=doc+; String k1=docBase+; doc_ids.add(k+k1); } @Override public boolean acceptsDocsOutOfOrder() { return true; } @Override public void setNextReader(IndexReader reader, int docBase) throws IOException { this.docBase = docBase; } @Override public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; } I could see in the highFrequentTerm that the condition for the document type A is performed. However, the highFrequent term isnot computed correctly, I still see duplicate term in the list beside wrong occuerence. here how I do it: TermInfoQueue tiq = new TermInfoQueue(numTerms); TermEnum terms = reader.terms(); TermDocs dok =null; int k=0; dok = reader.termDocs(); if (field != null) { while (terms.next()) { k=0; dok.seek(terms); while (dok.next()) { //System.out.println(dok.doc()); for(int i=0;i doc_ids.size();++i) { if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+)) { // here I can see that only doc ids for the type A is printed System.out.println(dok.doc()); if (terms.term().field().equals(field) ) { tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq())); } i=1; } } . . . any hint ? -- View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243143#comment-13243143 ] Uwe Schindler commented on LUCENE-3738: --- What does your comment mean? Good or bad? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3774) check-legal isn't doing its job
[ https://issues.apache.org/jira/browse/LUCENE-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243153#comment-13243153 ] Yonik Seeley commented on LUCENE-3774: -- bq. I have a different view on this. Things like this (license checking) are typically integration tests. Having them per-module only complicates build files and is an unnecessary overhead for running normal tests (because dependencies change rarely). +1 Having been bit by the changes in this issue dozens of times already, we shouldn't be doing these checks on a normal ant test. Seems like it should be fine to let Jenkins test it. * SolrCloud demo instructions that have you make a copy of example it example2, etc. * mv build build.old so I could compare two runs * try out a new jar locally w/o dotting all the i's I've seen users report these errors on the mailing list too, and it's not apparent to them what the issue is. check-legal isn't doing its job --- Key: LUCENE-3774 URL: https://issues.apache.org/jira/browse/LUCENE-3774 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 3.6, 4.0 Reporter: Steven Rowe Assignee: Dawid Weiss Fix For: 3.6, 4.0 Attachments: LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE3774.patch, backport.patch In trunk, the {{check-legal-lucene}} ant target is not checking any {{lucene/contrib/\*\*/lib/}} directories; the {{modules/**/lib/}} directories are not being checked; and {{check-legal-solr}} can't be checking {{solr/example/lib/\*\*/\*.jar}}, because there are currently {{.jar}} files in there that don't have a license. These targets are set up to take in a full list of {{lib/}} directories in which to check, but modules move around, and these lists are not being kept up-to-date. Instead, {{check-legal-\*}} should run for each module, if the module has a {{lib/}} directory, and it should be specialized for modules that have more than one ({{solr/core/}}) or that have a {{lib/}} directory in a non-standard place ({{lucene/core/}}). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3774) check-legal isn't doing its job
[ https://issues.apache.org/jira/browse/LUCENE-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243160#comment-13243160 ] Robert Muir commented on LUCENE-3774: - I agree too I think, its worse now that checking licenses means we have to resolve first, to ensure the jars actually exist. this adds overhead, maybe jenkins is good enough? it runs many times a day and we don't actually change jars that often: most of the time when developing we are just changing code... check-legal isn't doing its job --- Key: LUCENE-3774 URL: https://issues.apache.org/jira/browse/LUCENE-3774 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 3.6, 4.0 Reporter: Steven Rowe Assignee: Dawid Weiss Fix For: 3.6, 4.0 Attachments: LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE3774.patch, backport.patch In trunk, the {{check-legal-lucene}} ant target is not checking any {{lucene/contrib/\*\*/lib/}} directories; the {{modules/**/lib/}} directories are not being checked; and {{check-legal-solr}} can't be checking {{solr/example/lib/\*\*/\*.jar}}, because there are currently {{.jar}} files in there that don't have a license. These targets are set up to take in a full list of {{lib/}} directories in which to check, but modules move around, and these lists are not being kept up-to-date. Instead, {{check-legal-\*}} should run for each module, if the module has a {{lib/}} directory, and it should be specialized for modules that have more than one ({{solr/core/}}) or that have a {{lib/}} directory in a non-standard place ({{lucene/core/}}). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3774) check-legal isn't doing its job
[ https://issues.apache.org/jira/browse/LUCENE-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243162#comment-13243162 ] Dawid Weiss commented on LUCENE-3774: - I'm for pushing it to the top level. This will simplify handling of exceptional patterns and such too. Shouldn't be much of a problem to move it too. check-legal isn't doing its job --- Key: LUCENE-3774 URL: https://issues.apache.org/jira/browse/LUCENE-3774 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 3.6, 4.0 Reporter: Steven Rowe Assignee: Dawid Weiss Fix For: 3.6, 4.0 Attachments: LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE-3774.patch, LUCENE3774.patch, backport.patch In trunk, the {{check-legal-lucene}} ant target is not checking any {{lucene/contrib/\*\*/lib/}} directories; the {{modules/**/lib/}} directories are not being checked; and {{check-legal-solr}} can't be checking {{solr/example/lib/\*\*/\*.jar}}, because there are currently {{.jar}} files in there that don't have a license. These targets are set up to take in a full list of {{lib/}} directories in which to check, but modules move around, and these lists are not being kept up-to-date. Instead, {{check-legal-\*}} should run for each module, if the module has a {{lib/}} directory, and it should be specialized for modules that have more than one ({{solr/core/}}) or that have a {{lib/}} directory in a non-standard place ({{lucene/core/}}). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #443: POMs out of sync
I tried this seed on my 4-core Windows machine several times but no failure. This test failure might indicate that the DIH threading bugs aren't really fixed in 3.6. On the other hand, users of DIH threads on 3.6 will get a deprecation warning, the wiki discourages it and the feature is gone in 4.0. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Saturday, March 31, 2012 8:45 AM To: dev@lucene.apache.org Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-3.x #443: POMs out of sync Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/443/ 1 tests failed. REGRESSION: org.apache.solr.handler.dataimport.TestThreaded.testCachedThread_FullImport Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:409) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:376) at org.apache.solr.handler.dataimport.TestThreaded.verify(TestThreaded.java:73) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:36) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:61) at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:630) at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:536) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:67) at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:457) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:74) at org.apache.lucene.util.LuceneTestCase$SaveThreadAndTestNameRule$1.evaluate(LuceneTestCase.java:508) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:146) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:61) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:74) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:36) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:67) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at
Re: conditional High Freq Terms in Lucene index
I did as you mentioned and the problem still the same, I think the problem in the highFrequentTerm part. There I see duplicate words in the produced high frequent list. The comparison itself ok because I can see only terms belong to document type A is added to the TermInfoQueue. However, the frequency is not correctly counted for each term and also with some duplicate words in the list. Does something wrong with TermDocs dok and dok.freq()? -- View this message in context: http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873567.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2202) Money/Currency FieldType
[ https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243209#comment-13243209 ] Jan Høydahl commented on SOLR-2202: --- Thanks for sorting this out! Money/Currency FieldType Key: SOLR-2202 URL: https://issues.apache.org/jira/browse/SOLR-2202 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.5 Reporter: Greg Fodor Assignee: Jan Høydahl Fix For: 3.6, 4.0 Attachments: SOLR-2022-solr-3.patch, SOLR-2202-3x-stabilize-provider-interface.patch, SOLR-2202-fix-NPE-if-no-tlong-fieldType.patch, SOLR-2202-lucene-1.patch, SOLR-2202-no-fieldtype-deps.patch, SOLR-2202-solr-1.patch, SOLR-2202-solr-10.patch, SOLR-2202-solr-2.patch, SOLR-2202-solr-4.patch, SOLR-2202-solr-5.patch, SOLR-2202-solr-6.patch, SOLR-2202-solr-7.patch, SOLR-2202-solr-8.patch, SOLR-2202-solr-9.patch, SOLR-2202.patch, SOLR-2202.patch, SOLR-2202.patch, SOLR-2202.patch, SOLR-2202.patch, SOLR-2202.patch, SOLR-2202.patch, SOLR-2202.patch Provides support for monetary values to Solr/Lucene with query-time currency conversion. The following features are supported: - Point queries - Range quries - Sorting - Currency parsing by either currency code or symbol. - Symmetric Asymmetric exchange rates. (Asymmetric exchange rates are useful if there are fees associated with exchanging the currency.) At indexing time, money fields can be indexed in a native currency. For example, if a product on an e-commerce site is listed in Euros, indexing the price field as 1000,EUR will index it appropriately. By altering the currency.xml file, the sorting and querying against Solr can take into account fluctuations in currency exchange rates without having to re-index the documents. The new money field type is a polyfield which indexes two fields, one which contains the amount of the value and another which contains the currency code or symbol. The currency metadata (names, symbols, codes, and exchange rates) are expected to be in an xml file which is pointed to by the field type declaration in the schema.xml. The current patch is factored such that Money utility functions and configuration metadata lie in Lucene (see MoneyUtil and CurrencyConfig), while the MoneyType and MoneyValueSource lie in Solr. This was meant to mirror the work being done on the spacial field types. This patch will be getting used to power the international search capabilities of the search engine at Etsy. Also see WIKI page: http://wiki.apache.org/solr/MoneyFieldType -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1052) Deprecate/Remove indexDefaults and mainIndex in favor of indexConfig in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243219#comment-13243219 ] Jan Høydahl commented on SOLR-1052: --- The patch uses {{lockTypesingle/lockType}} for all tests, which should be suitable for all platforms. Tests pass for me, would love to have another pair of eyes on it too. Deprecate/Remove indexDefaults and mainIndex in favor of indexConfig in solrconfig.xml Key: SOLR-1052 URL: https://issues.apache.org/jira/browse/SOLR-1052 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Jan Høydahl Labels: solrconfig.xml Fix For: 3.6, 4.0 Attachments: SOLR-1052-3x-fix-tests.patch, SOLR-1052-3x.patch, SOLR-1052-3x.patch, SOLR-1052-3x.patch, SOLR-1052-3x.patch, SOLR-1052.patch Given that we now handle multiple cores via the solr.xml and the discussion around indexDefaults and mainIndex at http://www.lucidimagination.com/search/p:solr?q=mainIndex+vs.+indexDefaults We should deprecate old indexDefaults and mainIndex sections and only use a new indexConfig section. 3.6: Deprecation warning if old section used 4.0: If LuceneMatchVersion before LUCENE_40 then warn (so old configs will work), else fail fast -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3939) ClassCastException thrown in the map(String,int,TermVectorOffsetInfo[],int[]) method in org.apache.lucene.index.SortedTermVectorMapper
[ https://issues.apache.org/jira/browse/LUCENE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243243#comment-13243243 ] SHIN HWEI TAN commented on LUCENE-3939: --- Thanks for the comment. Below is a test case that illustrate the problem: The second invocation to the method map throws ClassCastException although it is expected to run normally without any exception. org.apache.lucene.index.SortedTermVectorMapper var3 = new org.apache.lucene.index.SortedTermVectorMapper(false, false,(java.util.Comparator)null); var3.setExpectations(, 0, false, false); org.apache.lucene.index.TermVectorOffsetInfo[] var11 = new org.apache.lucene.index.TermVectorOffsetInfo[] { }; var3.map(, (-1), var11, (int[])null); var3.map(*, (-1), (org.apache.lucene.index.TermVectorOffsetInfo[])null, (int[])null); ClassCastException thrown in the map(String,int,TermVectorOffsetInfo[],int[]) method in org.apache.lucene.index.SortedTermVectorMapper -- Key: LUCENE-3939 URL: https://issues.apache.org/jira/browse/LUCENE-3939 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.0.2, 3.1, 3.4, 3.5 Reporter: SHIN HWEI TAN Original Estimate: 0.05h Remaining Estimate: 0.05h The method map in the SortedTermVectorMapper class does not check the parameter term for the valid values. It throws ClassCastException when called with a invalid string for the parameter term (i.e., var3.map(*, (-1), null, null)). The exception thrown is due to an explict cast(i.e., casting the return value of termToTVE.get(term) to type TermVectorEntry). Suggested Fixes: Replace the beginning of the method body for the class SortedTermVectorMapper by changing it like this: public void map(String term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions) { if(termToTVE.get(term) instanceof TermVectorEntry){ TermVectorEntry entry = (TermVectorEntry) termToTVE.get(term); ... } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243265#comment-13243265 ] Uwe Schindler commented on LUCENE-3738: --- Mike, I was away from home and did not understand your comment, now its clear: You cannot reproduce the speedup from last patch neither can you see a difference with current patch. I would suggest that I commit this now to trunk, we test a few nights and then commit it to 3.x (Robert needs to backport Ivy to 3.6, so we have some time). I will commit this later before going to sleep, so we see results tomorrow. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3930) nuke jars from source tree and use ivy
[ https://issues.apache.org/jira/browse/LUCENE-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243266#comment-13243266 ] Hoss Man commented on LUCENE-3930: -- I did some testing of the packages built using trunk (circa r1307608)... * we don't ship solr's build.xml (or any of the sub-build.xml files) in the binary artifacts, and with these changes most of the new ivy.xml files are also excluded -- but for some reason these newly added files are showing up, we should probably figure out why and exclude them as well since they aren't usable and could easily people... ** ./example/example-DIH/ivy.xml ** ./example/example-DIH/build.xml ** ./example/ivy.xml ** ./example/build.xml * the lib's for test-framework (ant, ant-junit, and junit) aren't being included in the lucene binary artifacts ... for the ant jars this might (test-framework doesn't actually have any run-time deps on anything in ant does it?) but it seems like hte junit jar should be included since including lucene-test-framework.jar in your classpath is useless w/o also including junit * ant ivy-bootstrap followed by ant test using the lucene source package (lucene-4.0-SNAPSHOT-src.tgz) produces a build failure -- but this may have been a problem even before ivy (note the working dir and the final error)... {noformat} hossman@bester:~/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT$ ant test ... [junit] Testsuite: org.apache.lucene.util.junitcompat.TestReproduceMessage [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.114 sec [junit] test: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: ivy-availability-check: ivy-fail: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hossman/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/build/core/classes/java compile-core: compile-test-framework: ivy-availability-check: ivy-fail: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hossman/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml init: compile-lucene-core: compile-core: compile-test: [echo] Building demo... ivy-availability-check: ivy-fail: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hossman/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml common.init: compile-lucene-core: contrib-build.init: check-lucene-core-uptodate: jar-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: ivy-availability-check: ivy-fail: resolve: [ivy:retrieve] :: loading settings :: url = jar:file:/home/hossman/.ant/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/build/core/classes/java compile-core: jar-core: [jar] Building jar: /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/build/core/lucene-core-4.0-SNAPSHOT.jar init: compile-test: [echo] Building demo... check-analyzers-common-uptodate: jar-analyzers-common: BUILD FAILED /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/build.xml:487: The following error occurred while executing this line: /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/common-build.xml:1026: The following error occurred while executing this line: /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/contrib/contrib-build.xml:58: The following error occurred while executing this line: /home/hossman/tmp/ivy-pck-testing/lu/src/lucene-4.0-SNAPSHOT/common-build.xml:551: Basedir /home/hossman/tmp/ivy-pck-testing/lu/src/modules/analysis/common does not exist Total time: 5 minutes 10 seconds {noformat} ...it's trying to reach back up out of the working directory into ../modules nuke jars from source tree and use ivy -- Key: LUCENE-3930 URL: https://issues.apache.org/jira/browse/LUCENE-3930 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Assignee: Robert Muir Priority: Blocker Fix For: 3.6 Attachments: LUCENE-3930-skip-sources-javadoc.patch, LUCENE-3930-solr-example.patch, LUCENE-3930-solr-example.patch, LUCENE-3930.patch, LUCENE-3930.patch, LUCENE-3930.patch,
[jira] [Updated] (LUCENE-3930) nuke jars from source tree and use ivy
[ https://issues.apache.org/jira/browse/LUCENE-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-3930: - Attachment: LUCENE-3930_includetestlibs_excludeexamplexml.patch patch fixing the first two problems i mentioned above: * categorically exclude build.xml and ivy.xml files from solr binary packages (to prevent the ones under example from being included) * add parity to what files under test-framework get included in line with how contrib is treated (new patterns try to match some things that don't existing in test-framework, but i don't think that's bad -- future proofs us) nuke jars from source tree and use ivy -- Key: LUCENE-3930 URL: https://issues.apache.org/jira/browse/LUCENE-3930 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Assignee: Robert Muir Priority: Blocker Fix For: 3.6 Attachments: LUCENE-3930-skip-sources-javadoc.patch, LUCENE-3930-solr-example.patch, LUCENE-3930-solr-example.patch, LUCENE-3930.patch, LUCENE-3930.patch, LUCENE-3930.patch, LUCENE-3930__ivy_bootstrap_target.patch, LUCENE-3930_includetestlibs_excludeexamplexml.patch, ant_-verbose_clean_test.out.txt, noggit-commons-csv.patch, patch-jetty-build.patch As mentioned on the ML thread: switch jars to ivy mechanism?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243293#comment-13243293 ] Michael McCandless commented on LUCENE-3738: Sorry Uwe, that was exactly it: I don't know what to conclude from the perf runs anymore. But +1 for your new patch: it ought to be better since the code is simpler. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3940) When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole
[ https://issues.apache.org/jira/browse/LUCENE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243299#comment-13243299 ] Michael McCandless commented on LUCENE-3940: bq. StandardTokenizer doesnt leave holes when it drops punctuation, But is that really good? This means a PhraseQuery will match across end-of-sentence (.), semicolon, colon, comma, etc. (English examples..). I think tokenizers should throw away as little information as possible... we can always filter out such tokens in a later stage? For example, if a tokenizer created punct tokens (instead of silently discarding them), other token filters could make use of them in the mean time, eg a synonym rule for u.s.a. - usa or maybe a dedicated English acronyms filter. We could then later filter them out, even not leaving holes, and have the same behavior that we have now? Are there non-English examples where you would want the PhraseQuery to match over punctuation...? EG, for Japanese, I assume we don't want PhraseQuery applying across periods/commas, like it will now? (Not sure about middle dot...? Others...?). When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole - Key: LUCENE-3940 URL: https://issues.apache.org/jira/browse/LUCENE-3940 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-3940.patch, LUCENE-3940.patch I modified BaseTokenStreamTestCase to assert that the start/end offsets match for graph (posLen 1) tokens, and this caught a bug in Kuromoji when the decompounding of a compound token has a punctuation token that's dropped. In this case we should leave hole(s) so that the graph is intact, ie, the graph should look the same as if the punctuation tokens were not initially removed, but then a StopFilter had removed them. This also affects tokens that have no compound over them, ie we fail to leave a hole today when we remove the punctuation tokens. I'm not sure this is serious enough to warrant fixing in 3.6 at the last minute... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3930) nuke jars from source tree and use ivy
[ https://issues.apache.org/jira/browse/LUCENE-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243598#comment-13243598 ] Jan Høydahl commented on LUCENE-3930: - We have a 7Mb jar which is included in the binary distro twice. Any way to get rid of one? {code} ./contrib/analysis-extras/lib/icu4j-4.8.1.1.jar ./contrib/extraction/lib/icu4j-4.8.1.1.jar {code} Also, from what I can see, {{solr/contrib/extraction/lib/xml-apis-1.0.b2.jar}} dependency is redundant - tests pass without it See https://issues.apache.org/jira/browse/TIKA-412 and https://issues.apache.org/jira/browse/LUCENE-2961 nuke jars from source tree and use ivy -- Key: LUCENE-3930 URL: https://issues.apache.org/jira/browse/LUCENE-3930 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir Assignee: Robert Muir Priority: Blocker Fix For: 3.6 Attachments: LUCENE-3930-skip-sources-javadoc.patch, LUCENE-3930-solr-example.patch, LUCENE-3930-solr-example.patch, LUCENE-3930.patch, LUCENE-3930.patch, LUCENE-3930.patch, LUCENE-3930__ivy_bootstrap_target.patch, LUCENE-3930_includetestlibs_excludeexamplexml.patch, ant_-verbose_clean_test.out.txt, noggit-commons-csv.patch, patch-jetty-build.patch As mentioned on the ML thread: switch jars to ivy mechanism?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3254) Upgrade Solr to Tika 1.1
[ https://issues.apache.org/jira/browse/SOLR-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-3254: - Assignee: Jan Høydahl Upgrade Solr to Tika 1.1 Key: SOLR-3254 URL: https://issues.apache.org/jira/browse/SOLR-3254 Project: Solr Issue Type: Improvement Components: contrib - LangId, contrib - Solr Cell (Tika extraction) Reporter: Jan Høydahl Assignee: Jan Høydahl Fix For: 4.0 Attachments: SOLR-3254.patch Tika 1.1 is being released soon. It features some new parsers, ability to extract text from password protected PDFs and office docs, and several bug fixes. See http://people.apache.org/~mattmann/apache-tika-1.1/rc1/CHANGES-1.1.txt We should upgrade as soon as it is released. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1929) Index encrypted pdf files
[ https://issues.apache.org/jira/browse/SOLR-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1929: -- Fix Version/s: 4.0 Assignee: Jan Høydahl Index encrypted pdf files - Key: SOLR-1929 URL: https://issues.apache.org/jira/browse/SOLR-1929 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Yiannis Pericleous Assignee: Jan Høydahl Priority: Minor Fix For: 4.0 Attachments: SOLR-1929.patch SolrCell is not able to index encrypted pdfs. This is easily done by supplying the password in the metadata passed on to tika -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3939) ClassCastException thrown in the map(String,int,TermVectorOffsetInfo[],int[]) method in org.apache.lucene.index.SortedTermVectorMapper
[ https://issues.apache.org/jira/browse/LUCENE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243607#comment-13243607 ] SHIN HWEI TAN commented on LUCENE-3939: --- Thanks for the quick response. I don't think that passing null as Comparator is the problem. For example, if the first invocation of the method map is commented out(as below), then there is no exception thrown. In this case, the Comparator is still null. org.apache.lucene.index.SortedTermVectorMapper var3 = new org.apache.lucene.index.SortedTermVectorMapper(false, false,(java.util.Comparator)null); var3.setExpectations(, 0, false, false); var3.map(*:, (-1), (org.apache.lucene.index.TermVectorOffsetInfo[])null, (int[])null); ClassCastException thrown in the map(String,int,TermVectorOffsetInfo[],int[]) method in org.apache.lucene.index.SortedTermVectorMapper -- Key: LUCENE-3939 URL: https://issues.apache.org/jira/browse/LUCENE-3939 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.0.2, 3.1, 3.4, 3.5 Reporter: SHIN HWEI TAN Original Estimate: 0.05h Remaining Estimate: 0.05h The method map in the SortedTermVectorMapper class does not check the parameter term for the valid values. It throws ClassCastException when called with a invalid string for the parameter term (i.e., var3.map(*, (-1), null, null)). The exception thrown is due to an explict cast(i.e., casting the return value of termToTVE.get(term) to type TermVectorEntry). Suggested Fixes: Replace the beginning of the method body for the class SortedTermVectorMapper by changing it like this: public void map(String term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions) { if(termToTVE.get(term) instanceof TermVectorEntry){ TermVectorEntry entry = (TermVectorEntry) termToTVE.get(term); ... } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1856) In Solr Cell, literals should override Tika-parsed values
[ https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1856: -- Affects Version/s: (was: 1.4) Fix Version/s: 4.0 Assignee: Jan Høydahl In Solr Cell, literals should override Tika-parsed values - Key: SOLR-1856 URL: https://issues.apache.org/jira/browse/SOLR-1856 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Chris Harris Assignee: Jan Høydahl Fix For: 4.0 Attachments: SOLR-1856.patch I propose that ExtractingRequestHandler / SolrCell literals should take precedence over Tika-parsed metadata in all situations, including where multiValued=true. (Compare SOLR-1633?) My personal motivation is that I have several fields (e.g. title, date) where my own metadata is much superior to what Tika offers, and I want to throw those Tika values away. (I actually wouldn't mind throwing away _all_ Tika-parsed values, but let's set that aside.) SOLR-1634 is one potential approach to this, but the fix here might be simpler. I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2649: -- Affects Version/s: (was: 3.3) Fix Version/s: 4.0 MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Priority: Minor Fix For: 4.0 Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243611#comment-13243611 ] Jan Høydahl commented on SOLR-2366: --- Note to self: catch up on this again :) Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl closed SOLR-1895. - Resolution: Won't Fix Fix Version/s: (was: 4.0) Closing this as Won't fix since the fix is checked in to MCF's source tree ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time -- Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Labels: document, security, solr Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java, SOLR-1895-queries.patch, SOLR-1895-queries.patch, SOLR-1895-queries.patch, SOLR-1895-queries.patch, SOLR-1895-queries.patch, SOLR-1895-service-plugin.patch, SOLR-1895-service-plugin.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch, SOLR-1895.patch I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1758) schema definition for configuration files (validation, XSD)
[ https://issues.apache.org/jira/browse/SOLR-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243613#comment-13243613 ] Jan Høydahl commented on SOLR-1758: --- Mike, do you have an updated patch for this? What do you think about holding the xsd's inside the war? schema definition for configuration files (validation, XSD) --- Key: SOLR-1758 URL: https://issues.apache.org/jira/browse/SOLR-1758 Project: Solr Issue Type: New Feature Reporter: Jorg Heymans Labels: configuration, schema.xml, solrconfig.xml, validation, xsd Fix For: 4.0 Attachments: config-validation-20110523.patch It is too easy to make configuration errors in Solr without getting warnings. We should explore ways of validation configurations. See mailing list discussion at http://search-lucene.com/m/h6xKf1EShE6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2934) Problem with Solr Hunspell with French Dictionary
[ https://issues.apache.org/jira/browse/SOLR-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2934: -- Fix Version/s: 4.0 Problem with Solr Hunspell with French Dictionary - Key: SOLR-2934 URL: https://issues.apache.org/jira/browse/SOLR-2934 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5 Environment: Windows 7 Reporter: Nathan Castelein Assignee: Chris Male Fix For: 4.0 Attachments: en_GB.aff, en_GB.dic I'm trying to add the HunspellStemFilterFactory to my Solr project. I'm trying this on a fresh new download of Solr 3.5. I downloaded french dictionary here (found it from here): http://www.dicollecte.org/download/fr/hunspell-fr-moderne-v4.3.zip But when I start Solr and go to the Solr Analysis, an error occurs in Solr. Is there the trace : java.lang.RuntimeException: Unable to load hunspell data! [dictionary=en_GB.dic,affix=fr-moderne.aff] at org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:82) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:546) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:126) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 3 at java.lang.String.charAt(Unknown Source) at org.apache.lucene.analysis.hunspell.HunspellDictionary$DoubleASCIIFlagParsingStrategy.parseFlags(HunspellDictionary.java:382) at org.apache.lucene.analysis.hunspell.HunspellDictionary.parseAffix(HunspellDictionary.java:165) at org.apache.lucene.analysis.hunspell.HunspellDictionary.readAffixFile(HunspellDictionary.java:121) at org.apache.lucene.analysis.hunspell.HunspellDictionary.init(HunspellDictionary.java:64) at org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:46) I can't find where the problem is. It seems like my dictionary isn't well written for hunspell, but I tried with two different dictionaries, and I had the same problem. I also tried with an english dictionary, and ... it works ! So I think that my french dictionary is wrong for hunspell, but I don't know why ... Can you help me ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Closed] (SOLR-435) QParser must validate existence/absence of q parameter
[ https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed SOLR-435. - Resolution: Fixed Re-committed to 4.x, and I moved the CHANGES.txt from the v4 to v3 section on both branches. Closing issue. QParser must validate existence/absence of q parameter Key: SOLR-435 URL: https://issues.apache.org/jira/browse/SOLR-435 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Assignee: David Smiley Fix For: 3.6, 4.0 Attachments: SOLR-2001_3x_backport_with_empty_string_check_and_test.patch, SOLR-435.patch, SOLR-435_3x_consistent_errors.patch, SOLR-435_q_defaults_to_all-docs.patch Each QParser should check if q exists or not. For some it will be required others not. currently it throws a null pointer: {code} java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:80) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150) ... {code} see: http://www.nabble.com/query-parsing-error-to14124285.html#a14140108 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12928 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12928/ 2 tests failed. REGRESSION: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch Error Message: java.lang.AssertionError: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch: Insane FieldCache usage(s) found expected:0 but was:1 Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch: Insane FieldCache usage(s) found expected:0 but was:1 at org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:819) at org.apache.lucene.util.LuceneTestCase.access$900(LuceneTestCase.java:138) at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:676) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69) at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:591) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75) at org.apache.lucene.util.LuceneTestCase$SaveThreadAndTestNameRule$1.evaluate(LuceneTestCase.java:642) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:63) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:38) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) Caused by: java.lang.AssertionError: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch: Insane FieldCache usage(s) found expected:0 but was:1 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:930) at org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:809) ... 28 more FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest Error Message: ERROR: SolrIndexSearcher opens=93 closes=91 Stack Trace: junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=93 closes=91 at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$3.addError(JUnitTestRunner.java:974) at junit.framework.TestResult.addError(TestResult.java:38) at junit.framework.JUnit4TestAdapterCache$1.testFailure(JUnit4TestAdapterCache.java:51) at org.junit.runner.notification.RunNotifier$4.notifyListener(RunNotifier.java:100) at org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:41) at org.junit.runner.notification.RunNotifier.fireTestFailure(RunNotifier.java:97) at org.junit.internal.runners.model.EachTestNotifier.addFailure(EachTestNotifier.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:306) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)