[jira] [Closed] (PYLUCENE-25) JCC: NameError: global name 'StringWriter' is not defined occurs when java exception raised

2013-06-19 Thread Ilia Meerovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PYLUCENE-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilia Meerovich closed PYLUCENE-25.
--

Resolution: Implemented

 JCC: NameError: global name 'StringWriter' is not defined occurs when java 
 exception raised
 -

 Key: PYLUCENE-25
 URL: https://issues.apache.org/jira/browse/PYLUCENE-25
 Project: PyLucene
  Issue Type: Bug
Reporter: Ilia Meerovich
  Labels: jcc

 I used jcc and tried to run generated python code.
 I noticed that when java exception occurs, python throws NameError exception:
 NameError: global name 'StringWriter' is not defined
 It looks like __init__.py needs to adapt to the full names features.
 I found that somebody already sent an email regards similar failure:
 http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201302.mbox/%3Calpine.OSX.2.01.1302041320590.1972@yuzu.local%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Reestablishing a Solr node that ran on a completely crashed machine

2013-06-19 Thread Per Steffensen

On 6/18/13 2:15 PM, Mark Miller wrote:

I don't know what the best method to use now is, but the slightly longer term 
plan is to:

* Have a new mode where you cannot preconfigure cores, only use the 
collection's API.
* ZK becomes the cluster state truth.
* The Overseer takes actions to ensure cores live/die in different places based 
on the truth in ZK.
Not that we have to decide on this now, but I guess in my scenario I 
do not see why the Overseer should be involved. The replica is already 
assigned to run on the replaced machine with a specific IP/hostname 
(actually a specific Solr node-name), so I guess that the Solr node 
itself on this new/replaced machine should just go look in ZK when it 
starts up and realize that it ought to run this and that replica and 
start loading them itself. I recognize that the Overseer should/could be 
involved in relocating replica for different reasons - loadbalancing, 
rack-awareness etc. But in cases where a replica is already assigned to 
a certain node-name according to ZK state, but the node is not 
preconfigured (in solr.xml) to run this replica, the node itself should 
just realize that it ought to run it anyway and load it. But it probably 
have to be thought through well. Just my immediate thoughts.


- Mark



Regards, Per Steffensen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4792) stop shipping a war in 5.0

2013-06-19 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687659#comment-13687659
 ] 

Noble Paul commented on SOLR-4792:
--

Thanks Shawn for pointing me to the list. Seriously, I was sleeping at the 
wheel 


Mark Miller nicely captured everything I have to say on this subject and I have 
very little to add . I always wanted Solr to be a standalone app 

+1

 stop shipping a war in 5.0
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread selckin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687730#comment-13687730
 ] 

selckin commented on LUCENE-4583:
-

A few comments up someone asked for a use case, shouldn't something like 
http://www.elasticsearch.org/guide/reference/mapping/source-field/ be a perfect 
thing to use BinaryDocValues for? 

I was trying to store something similar using DiskDocValuesFormat and hit the 
32k limit

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5064) Add PagedMutable

2013-06-19 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5064.
--

Resolution: Fixed

 Add PagedMutable
 

 Key: LUCENE-5064
 URL: https://issues.apache.org/jira/browse/LUCENE-5064
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5064.patch


 In the same way that we now have a PagedGrowableWriter, we could have a 
 PagedMutable which would behave just like PackedInts.Mutable but would 
 support more than 2B values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5006) Simplify / understand IndexWriter/DocumentsWriter synchronization

2013-06-19 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-5006:


Attachment: LUCENE-5006.patch

Here is a cleaned-up version of the patch.

I removed the accidentally added (leftover) int[] from BytesRefHash that was 
indeed unintended.

I also removed all the leftovers like forcePurge and applyDeletes flags they 
were still in there from a previous iteration without the Queue. I changed 
_maybeMerge_ to _hasEvents_ consistently.

The changes in DWPT and DWPTThreadPool are mainly due to the fact that I move 
the creation of DWPT into DW and out of the ThreadPool. The ThreadPool only 
maintains the ThreadState instances but is not responsible for creating the 
actual DWPT. DWPT is now not reuseable anymore, yet we never really reused 
them but if they were initialized and we did a full flush we kept using them 
with a new DeleteQueue which is gone now. This is nice since DWPT is now solely 
initialized in its Ctor. This includes the segment name which we obtain from IW 
when the DWPT is created. This remains the only place where we sync on IW which 
is done in updateDocument right now. 

I think this patch is a step into the right direction making this simpler, at 
the end of the day I'd want to change the lifetime of a DW to be a single flush 
and replace the entire DW once we flush or reopen. This would make a lot of 
logic much simpler but I don't want to make this big change at once so maybe we 
should work to get the current patch into trunk and let it bake in a bit.

 Simplify / understand IndexWriter/DocumentsWriter synchronization
 -

 Key: LUCENE-5006
 URL: https://issues.apache.org/jira/browse/LUCENE-5006
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Attachments: LUCENE-5006.patch, LUCENE-5006.patch


 The concurrency in IW/DW/BD is terrifying: there are many locks involved, not 
 just intrinsic locks but IW also has fullFlushLock, commitLock, and there are 
 no clear rules about lock order to avoid deadlocks like LUCENE-5002.
 We have to somehow simplify this, and define the allowed concurrent behavior 
 eg when an app calls deleteAll while other threads are indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687822#comment-13687822
 ] 

Artem Lukanin commented on LUCENE-5030:
---

I see, that some tests in AnalyzingSuggesterTest fail, so I have to look what's 
wrong...

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Subhash Karemore (JIRA)
Subhash Karemore created SOLR-4939:
--

 Summary: Not able to import oracle DB on RedHat
 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore


I have configured my RedHat system for Solr. After that I started the solr, it 
is started properly. I have to import the Oracle DB for indexing. My data 
config file is.

dataConfig
dataSource type=JdbcDataSource 
driver=oracle.jdbc.driver.OracleDriver 
url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user password=Passwd 
batchSize=1 /
document
entity name=table1 query=SELECT ID, col2, col3 FROM table1 
WHERE rownum BETWEEN 1 AND 1000 
field column=ID name=id /
field column=col2 name=col2 /
field column=col3 name=col3 /
/entity
/document
/dataConfig

I have done similar changes for schema.xml file.

I have copied the solr-dataimporthandler-4.3.0.jar, 
solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist folder 
to ../lib folder. Also I have downloaded ojdbc6.jar and put in same folder.

With this setting, it is working properly on Windows. However on RedHat, it is 
not working. It is giving me errors when I try to index DB.

Below are the errors which I got on console.

ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
processing: table1 document : 
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
BETWEEN 1 AND 1000 Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
could not establish the connection
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
at 
oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
at 
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
... 12 more
Caused by: oracle.net.ns.NetException: The Network Adapter could not establish 
the connection
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
at 
oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:247)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1102)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:320)
... 21 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 

IndexWriter commit user data takes a map

2013-06-19 Thread Varun Thacker
I was just curious as to why IW.setCommitData uses a map ?

Looking back at LUCENE-1382 when committing user data was introduced it
took a string.

In LUCENE-4575 it was refactored and changed to a Map. From the comments I
couldn't really figure out why was it changed.

-- 


Regards,
Varun Thacker
http://www.vthacker.in/


[jira] [Created] (LUCENE-5068) QueryParserUtil.escape() does not escape forward slash

2013-06-19 Thread Matias Holte (JIRA)
Matias Holte created LUCENE-5068:


 Summary: QueryParserUtil.escape() does not escape forward slash
 Key: LUCENE-5068
 URL: https://issues.apache.org/jira/browse/LUCENE-5068
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.0
Reporter: Matias Holte
Priority: Minor


QueryParserUtil.escape() and QueryParser.escape() have different 
implementations. Most important, the former omit escaping forward slash (/). 
This again caused errors in the queryparser when a query ended with forward 
slash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687883#comment-13687883
 ] 

Robert Muir commented on LUCENE-4583:
-

good god no.

DocValues are not stored fields... 

This reinforces the value of the limit!

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread selckin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687894#comment-13687894
 ] 

selckin commented on LUCENE-4583:
-

Ok, from the talks i watched on them  other info gathered it seemed like it 
would be a good fit, guess i really missed the point somewhere, can't find much 
info in the javadocs either, but guess this is for the user list and i 
shouldn't pollute this issue

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4583.
-

Resolution: Not A Problem

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Lukanin updated LUCENE-5030:
--

Attachment: nonlatin_fuzzySuggester.patch

now tests in FuzzySuggesterTest and AnalyzingSuggesterTest pass, except for 
AnalyzingSuggesterTest.testRandom (when preserveSep = true).

If I enable VERBOSE, I see, that suggestions are correct. I guess, there is a 
bug in the test, but I cannot find it.

Can you please review?

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687902#comment-13687902
 ] 

Robert Muir commented on LUCENE-5030:
-

I dont think changing SEP_LABEL from a single byte to 4 bytes is necessarily a 
good idea.

I think benchmarks (size and speed) should be run on this change before we jump 
into it, I'm also concerned about the determinization and shit being in the 
middle of an autosuggest request... this seems like it would be way way too 
slow.

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Estimating Solr memory requirements

2013-06-19 Thread Erick Erickson
OK, I seem to have stalled on this. Over part of the winter, I put
together a Swing-based program to help estimate Solr/Lucene memory
requirements, with all the usual caveats see:
https://github.com/ErickErickson/SolrMemoryEsitmator.

I have notes to myself that it's still deficient in several areas:
FieldValueCache estimates
tlog requirements
Memory required to re-open a searcher
Position and term vector memory requirements
And whatever I haven't thought about yet.

Of course it builds on Grant's spreadsheet (reads steals from it
shamelessly!) I'm hoping to have a friendlier interface. And _of
course_ I'd be willing to donate it to Solr as a util/contrib/whatever
if it fits.

So, what I'm about here is a few things:

 Anyone who wants to try it feel free. The build instructions are at the 
 above, but the short form is to clone it, ant jar and java -jar 
 dist/estimator.jar. Enter some field info and hit the Add/Save button then 
 hit the Dump calcs button to see what it does currently.

It also saves the estimates away in a file and shows all the steps it
goes through to perform the calculations. It'll also make rudimentary
field definitions from the entered data. You can come back to it later
and add to what you've already done.

 Make any improvements you see fit, particular to flesh out the deficiencies 
 listed above.

 Anyone who has, you know, graphic design/Swing skills please feel free to 
 make it better. I'm a newbie as far as using Swing is concerned, and the way 
 I align buttons and checkboxes is pretty hacky. But it works

 Any suggestions anyone wants to make. Suggestions in code are nicest of 
 course, but algorithms for calculating, say, position and tv memory usage 
 would be great as well! Isolated code snippets that I could incorporate would 
 be great too.

 Any info where I've gotten the calculations wrong or don't show enough info 
 to actually figure out whether they're correct or not.

Note that the goal for this is to give a rough idea of memory
requirements and be easy to use. The spreadsheet is a bit daunting to
someone who knows nothing about Solr so this might be an easier way to
get into it.

Thanks,
Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4939.
--

Resolution: Invalid

Please raise this issue on the user's list first to determine whether it's a 
bona-fide bug, I suspect a configuration error. If it is really a bug, we can 
re-open this.

 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
 at 
 oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434)
 at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687)
 at 

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687917#comment-13687917
 ] 

Artem Lukanin commented on LUCENE-5030:
---

Possibly we should change it to INFO_SEP2 (U+001E) as Michael suggested for 
TokenStreamToAutomaton?
Do you like 0x10 and 0x10fffe separators in TokenStreamToAutomaton? Won't 
they slow down the process?
I guess, Michael is the man, who runs benchmarks regularly? I don't know, how 
to do it...

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687924#comment-13687924
 ] 

Uwe Schindler commented on SOLR-4939:
-

Check your firewall! I think your server may not have TCP access to the 
database server.

 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
 at 
 oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434)
 at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687)
 at oracle.net.ns.NSProtocol.connect(NSProtocol.java:247)

[jira] [Created] (SOLR-4940) Cluster crashed for *:* queries with large page number (OOM)

2013-06-19 Thread Bjoern Ebers (JIRA)
Bjoern Ebers created SOLR-4940:
--

 Summary: Cluster crashed for *:* queries with large page number 
(OOM)
 Key: SOLR-4940
 URL: https://issues.apache.org/jira/browse/SOLR-4940
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: One collection is sharded by 8 high mem machines.
Each shard has one replica (additional 8 machines).
The Solr instances are started with -Xmx16384m -Xms4096m.
The index contains around 230-240 million documents.
All Solr instances are connected to a ZooKeeper ensemble with 5 instances.
Reporter: Bjoern Ebers
Priority: Critical


executing the query on the large index: q=*:*page=1000max=1000
this cause to an OOM and crashed the whole cluster!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4940) Cluster crashed for *:* queries with large page number (OOM)

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687938#comment-13687938
 ] 

Uwe Schindler commented on SOLR-4940:
-

see SOLR-1726

The main issue is: full-text search engine are only good in returning 
top-ranking results. If you increase the window of top-ranking results the 
underlying algortithms, which are optimized to do the find top-n fast, will 
need lots of memeory and get slow.

 Cluster crashed for *:* queries with large page number (OOM)
 

 Key: SOLR-4940
 URL: https://issues.apache.org/jira/browse/SOLR-4940
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: One collection is sharded by 8 high mem machines.
 Each shard has one replica (additional 8 machines).
 The Solr instances are started with -Xmx16384m -Xms4096m.
 The index contains around 230-240 million documents.
 All Solr instances are connected to a ZooKeeper ensemble with 5 instances.
Reporter: Bjoern Ebers
Priority: Critical

 executing the query on the large index: q=*:*page=1000max=1000
 this cause to an OOM and crashed the whole cluster!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Looking for community guidance on SOLR-4872

2013-06-19 Thread Benson Margulies
I write to seek guidance from the dev community on SOLR-4872.

This JIRA concerns lifecycle management for Solr schema components:
tokenizers, token filters, and char filters.

If you read the comments, you'll find three opinions from committers. What
follows are précis: read the JIRA to get the details.

Hoss is in favor of having close methods on these components and arranging
to have them called when a schema is torn down. Hoss is opposed to allowing
these objects to be SolrCoreAware.

Yonik is opposed to having such close methods and prefers SolrCoreAware, or
something like it, or letting component implementors use finalizers.

Rob Muir thinks that there should be a fix to the related LUCENE-2145,
which I see as complementary to this.

So, here I am. I'm not a committer. I'm a builder of Solr plugins, and,
from that standpoint, I think that there should be a lifecycle somehow,
because I try to apply a general principle of avoiding finalizers, and
because in some cases their unpredictable schedule can be a practical
problem.

Is there a committer in this community who is willing to work with me on
this? As things are, I can't see how to proceed, since I'm suspended
between two committers with apparently opposed views.

I have already implemented what I think of as the hard part, and, indeed,
the foundation of either approach. I have a close lifecycle that extends
down to the IndexSchema object and the TokenizerChain. So it remains to
decide whether that should in turn call ordinary close methods on the
tokenizers, token filters, and char filters, or rather look for some
optional lifecycle interface.


List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Benson Margulies
A small suggestion: identify the VP on the list of PMC and committers.


[lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Dmitry Kan
Hello,

executing 'package' on Apache Solr Search Server pom
(maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

Apache Maven 3.0.4
OS: Ubuntu 12.04 LTS

Thanks,

Dmitry Kan


Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Dmitry Kan
also: ${build-directory} is not set anywhere in the project.


On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote:

 Hello,

 executing 'package' on Apache Solr Search Server pom
 (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

 Apache Maven 3.0.4
 OS: Ubuntu 12.04 LTS

 Thanks,

 Dmitry Kan



[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_21) - Build # 6138 - Still Failing!

2013-06-19 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6138/
Java: 32bit/jdk1.7.0_21 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestFieldsReader.testExceptions

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([A3AC19F388354DBF:D5AD4B5B20483309]:0)
at org.apache.lucene.util.BytesRef.copyBytes(BytesRef.java:196)
at org.apache.lucene.util.BytesRef.deepCopyOf(BytesRef.java:343)
at 
org.apache.lucene.codecs.lucene3x.TermBuffer.toTerm(TermBuffer.java:113)
at 
org.apache.lucene.codecs.lucene3x.SegmentTermEnum.term(SegmentTermEnum.java:184)
at 
org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.next(Lucene3xFields.java:863)
at 
org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292)
at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:103)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3767)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3371)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1887)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1697)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
org.apache.lucene.index.TestFieldsReader.testExceptions(TestFieldsReader.java:204)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)




Build Log:
[...truncated 355 lines...]
[junit4:junit4] Suite: org.apache.lucene.index.TestFieldsReader
[junit4:junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestFieldsReader -Dtests.method=testExceptions 
-Dtests.seed=A3AC19F388354DBF -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=mt_MT -Dtests.timezone=Europe/Samara 
-Dtests.file.encoding=ISO-8859-1
[junit4:junit4] ERROR   1.49s J0 | TestFieldsReader.testExceptions 
[junit4:junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space
[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([A3AC19F388354DBF:D5AD4B5B20483309]:0)
[junit4:junit4]at 
org.apache.lucene.util.BytesRef.copyBytes(BytesRef.java:196)
[junit4:junit4]at 
org.apache.lucene.util.BytesRef.deepCopyOf(BytesRef.java:343)
[junit4:junit4]at 
org.apache.lucene.codecs.lucene3x.TermBuffer.toTerm(TermBuffer.java:113)
[junit4:junit4]at 
org.apache.lucene.codecs.lucene3x.SegmentTermEnum.term(SegmentTermEnum.java:184)
[junit4:junit4]at 
org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.next(Lucene3xFields.java:863)
[junit4:junit4]at 
org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292)
[junit4:junit4]at 
org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318)
[junit4:junit4]at 
org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:103)
[junit4:junit4]at 
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
[junit4:junit4]at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
[junit4:junit4]at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
[junit4:junit4]

Re: Reestablishing a Solr node that ran on a completely crashed machine

2013-06-19 Thread Mark Miller

On Jun 19, 2013, at 2:20 AM, Per Steffensen st...@designware.dk wrote:

 On 6/18/13 2:15 PM, Mark Miller wrote:
 I don't know what the best method to use now is, but the slightly longer 
 term plan is to:
 
 * Have a new mode where you cannot preconfigure cores, only use the 
 collection's API.
 * ZK becomes the cluster state truth.
 * The Overseer takes actions to ensure cores live/die in different places 
 based on the truth in ZK.
 Not that we have to decide on this now, but I guess in my scenario I do not 
 see why the Overseer should be involved. The replica is already assigned to 
 run on the replaced machine with a specific IP/hostname (actually a 
 specific Solr node-name), so I guess that the Solr node itself on this 
 new/replaced machine should just go look in ZK when it starts up and realize 
 that it ought to run this and that replica and start loading them itself. I 
 recognize that the Overseer should/could be involved in relocating replica 
 for different reasons - loadbalancing, rack-awareness etc. But in cases where 
 a replica is already assigned to a certain node-name according to ZK state, 
 but the node is not preconfigured (in solr.xml) to run this replica, the node 
 itself should just realize that it ought to run it anyway and load it. But it 
 probably have to be thought through well. Just my immediate thoughts.

Specific node names have since been essentially deprecated - auto assigned 
generic node names are what we have transitioned to. You should easily be able 
to host a shard with a machine that has a different address without confusion. 

By and large, the Overseer will be able too assume responsibility for 
assignments (though I'm sure how much it will do will be configurable) at a 
high level. It will be able to do things like look at maxShardsPerNode and 
replicationFactor and periodically follow rules to make adjustments. 

The Overseer being in charge is more a conceptual idea though, not the 
implementation. When a core starts up and checks with ZK and sees the 
collection it belongs to no longer exists or something, it likely to just not 
load rather than wait for an Overseer to spot and it remove it later.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4921) Support for Adding Documents via the Solr UI

2013-06-19 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-4921:
--

Attachment: SOLR-4921.patch

Patch has the following improvements
# Better Layout
# Result Reporting, including errors
# Various other little fixes

You should be able to submit a variety of document types at this point and see 
the response.

Left to do:
# Icon for Collection drop down
# Wizard implementation
# General cleanup, comments
# File Upload
# Other things I've forgotten

 Support for Adding Documents via the Solr UI
 

 Key: SOLR-4921
 URL: https://issues.apache.org/jira/browse/SOLR-4921
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, 
 SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch


 For demos and prototyping, it would be nice if we could add documents via the 
 admin UI.
 Various things to support:
 1. Uploading XML, JSON, CSV, etc.
 2. Optionally also do file upload

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Dmitry Kan
After adding:

build-directorytarget/build-directory

the war file is put into the target subdir.


On a side note:

running solr with maven jetty plugin seem to work, which required two
artifacts (couldn't figure out where does jetty store the lib dir in this
mode):

command. mvn jetty:run-war

(configured in the jetty-maven-plugin):

  dependencies
dependency
  groupIdch.qos.logback/groupId
  artifactIdlogback-classic/artifactId
  version1.0.13/version
/dependency
dependency
  groupIdtomcat/groupId
  artifactIdcommons-logging/artifactId
  version4.0.6/version
/dependency
  /dependencies


when starting the webapp, however, solr tries to create a collection1:

17:02:53.108 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using
instanceDir: ${top-level}/solr/example/solr/collection1

Apparently, ${top-level} var isn't defined either.




On 19 June 2013 16:25, Dmitry Kan dmitry.luc...@gmail.com wrote:

 also: ${build-directory} is not set anywhere in the project.


 On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote:

 Hello,

 executing 'package' on Apache Solr Search Server pom
 (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

 Apache Maven 3.0.4
 OS: Ubuntu 12.04 LTS

 Thanks,

 Dmitry Kan





Re: List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Yonik Seeley
On Wed, Jun 19, 2013 at 8:56 AM, Benson Margulies bimargul...@gmail.com wrote:
 A small suggestion: identify the VP on the list of PMC and committers.

Why?
To the outside, this might suggest some sort of specialness that
doesn't exist for day to day development activities.
If someone has business with the PMC, they should email the PMC, not
individuals.

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter commit user data takes a map

2013-06-19 Thread Steve Rowe
Hi Varun,

LUCENE-4575 did not change IW's user data to a Map.  That was done in 
LUCENE-1654.

Steve

On Jun 19, 2013, at 6:57 AM, Varun Thacker varunthacker1...@gmail.com wrote:

 I was just curious as to why IW.setCommitData uses a map ?
 
 Looking back at LUCENE-1382 when committing user data was introduced it took 
 a string. 
 
 In LUCENE-4575 it was refactored and changed to a Map. From the comments I 
 couldn't really figure out why was it changed. 
 
 -- 
 
 
 Regards,
 Varun Thacker
 http://www.vthacker.in/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Steve Rowe
Thanks for reporting, Dmitry, I'll take a look. - Steve

On Jun 19, 2013, at 10:06 AM, Dmitry Kan dmitry.luc...@gmail.com wrote:

 After adding:
 
 build-directorytarget/build-directory
 
 the war file is put into the target subdir.
 
 
 On a side note:
 
 running solr with maven jetty plugin seem to work, which required two 
 artifacts (couldn't figure out where does jetty store the lib dir in this 
 mode):
 
 command. mvn jetty:run-war
 
 (configured in the jetty-maven-plugin):
 
   dependencies
 dependency
   groupIdch.qos.logback/groupId
   artifactIdlogback-classic/artifactId
   version1.0.13/version
 /dependency
 dependency
   groupIdtomcat/groupId
   artifactIdcommons-logging/artifactId
   version4.0.6/version
 /dependency
   /dependencies
 
 
 when starting the webapp, however, solr tries to create a collection1:
 
 17:02:53.108 [coreLoadExecutor-3-thread-1] INFO  
 org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using 
 instanceDir: ${top-level}/solr/example/solr/collection1
 
 Apparently, ${top-level} var isn't defined either.
 
 
 
 
 On 19 June 2013 16:25, Dmitry Kan dmitry.luc...@gmail.com wrote:
 also: ${build-directory} is not set anywhere in the project.
 
 
 On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote:
 Hello,
 
 executing 'package' on Apache Solr Search Server pom 
 (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.
 
 Apache Maven 3.0.4
 OS: Ubuntu 12.04 LTS
 
 Thanks,
 
 Dmitry Kan
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Simon Willnauer
+1 on not specially marking it. If you really wanna know you can figure it
out via the asf website. I agree with yonik that the PMC should be
contacted!

simon


On Wed, Jun 19, 2013 at 4:13 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Wed, Jun 19, 2013 at 8:56 AM, Benson Margulies bimargul...@gmail.com
 wrote:
  A small suggestion: identify the VP on the list of PMC and committers.

 Why?
 To the outside, this might suggest some sort of specialness that
 doesn't exist for day to day development activities.
 If someone has business with the PMC, they should email the PMC, not
 individuals.

 -Yonik
 http://lucidworks.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Mark Miller

On Jun 19, 2013, at 11:01 AM, Simon Willnauer simon.willna...@gmail.com wrote:

 +1 on not specially marking it. 

+1 - I like the way we currently handle this. 

- Mark


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4921) Support for Adding Documents via the Solr UI

2013-06-19 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-4921:
--

Attachment: SOLR-4921.patch

Here's a start on file upload.  It kind of works right now if you hit the 
submit button twice (after changing the QT option to /update/extract).  There 
seems to be some oddities with variable bindings for creating the document_url 
based off of the handler path.

 Support for Adding Documents via the Solr UI
 

 Key: SOLR-4921
 URL: https://issues.apache.org/jira/browse/SOLR-4921
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, 
 SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, 
 SOLR-4921.patch


 For demos and prototyping, it would be nice if we could add documents via the 
 admin UI.
 Various things to support:
 1. Uploading XML, JSON, CSV, etc.
 2. Optionally also do file upload

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



solrj content-length header missing

2013-06-19 Thread Payne, Joe
We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add.../add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?
Joe



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.


[jira] [Commented] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688094#comment-13688094
 ] 

Mark Miller commented on SOLR-4916:
---

It doesn't greatly affect other parts of Solr, it's not some big experimental 
change, so I intend to first commit to 5x and see how jenkins likes things and 
then backport to 4.x.

A lot of the core changes for this have slowly gone into 4.x long ago - 
including issues around making custom Directories first class in Solr and other 
little changes.

This builds to run against Apache Hadoop. I don't suspect that will be easily 
'pluggable', but it will be easy enough to change the ivy files to point to 
another Hadoop distro, fix any compile time errors (if there are any), run the 
tests, and build Solr.

Because our dependency is on client code that talks to hdfs, I suspect that it 
will work fine as is with most distros based on the same version of Apache 
Hadoop - and probably other versions as well in many cases.



 Add support to write and read Solr index files and transaction log files to 
 and from HDFS.
 --

 Key: SOLR-4916
 URL: https://issues.apache.org/jira/browse/SOLR-4916
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Assignee: Mark Miller
 Attachments: SOLR-4916.patch, SOLR-4916.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

2013-06-19 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688104#comment-13688104
 ] 

Shawn Heisey commented on SOLR-4934:


I was getting ready to file an issue, glad I found this before doing so.  The 
only thing I knew was that LUCENE-5038 had caused Solr to make compound files 
and the useCompoundFile setting under indexConfig that I found in the branch_4x 
example wasn't turning it off.

A connected discussion, for which I can file an issue if necessary: Assuming 
there are plenty of file descriptors available, will a user get better 
performance from compound files or separate files?  Is it dependent on other 
factors like filesystem choice, or is one a clear winner?  The outcome of that 
discussion should decide what Solr's default is when no related config options 
are used.


 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688113#comment-13688113
 ] 

David Smiley commented on LUCENE-4583:
--

Should the closed status and resolution change to not a problem mean that 
[~mikemccand] improvement's in his patch here (that don't change the limit) 
won't get applied?  They looked good to me.  And you?

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5067) add a BaseDirectoryTestCase

2013-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688126#comment-13688126
 ] 

Michael McCandless commented on LUCENE-5067:


+1

 add a BaseDirectoryTestCase
 ---

 Key: LUCENE-5067
 URL: https://issues.apache.org/jira/browse/LUCENE-5067
 Project: Lucene - Core
  Issue Type: Test
Reporter: Robert Muir

 Currently most directory code is tested indirectly. But there are still 
 corner cases like LUCENE-5066, NRCachingDirectory.testNoDir, 
 TestRAMDirectory.testSeekToEOFThenBack, that only target specific directories 
 where some user reported the bug. If one of our other directories has these 
 bugs, the best we can hope for is some other lucene test will trip it 
 indirectly and we will find it after lots of debugging...
 Instead we should herd up all these tests into a base class and test every 
 directory explicitly and directly with it (like we do with the codec API).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: solrj content-length header missing

2013-06-19 Thread Uwe Schindler
Hi,

 

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library ( http://hc.apache.org/ 
http://hc.apache.org/).

 

What is the problem / error message of nginx?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.org
Subject: solrj content-length header missing

 

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?

Joe

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688146#comment-13688146
 ] 

Yonik Seeley commented on SOLR-4926:


I hacked the lucene IWC and MergePolicy classes to never use compound format, 
and then started ChaosMonkeySafeLeaderTest tests in a loop.
11 passes in a row so far, so it definitely looks like these failures are 
related to the compound file format.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150
 ] 

Uwe Schindler commented on SOLR-4926:
-

How does this test depend on CFS or not?

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150
 ] 

Uwe Schindler edited comment on SOLR-4926 at 6/19/13 4:53 PM:
--

How does this test depend on CFS or not? So it looks like replication does not 
work correctly with CFS, which is a serious bug!

  was (Author: thetaphi):
How does this test depend on CFS or not?
  
 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688151#comment-13688151
 ] 

Yonik Seeley commented on SOLR-4926:


bq. How does this test depend on CFS or not?

That's the million dollar question :-)  It does not, explicitly, but it seems 
like the use of CFS somehow causes replication to fail.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: solrj content-length header missing

2013-06-19 Thread Payne, Joe
This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

Hi,

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library (http://hc.apache.org/).

What is the problem / error message of nginx?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.dehttp://www.thetaphi.de/
eMail: u...@thetaphi.demailto:u...@thetaphi.de

From: Payne, Joe [mailto:joe.pa...@kroger.com]
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: solrj content-length header missing

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?
Joe



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.


[jira] [Comment Edited] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688094#comment-13688094
 ] 

Mark Miller edited comment on SOLR-4916 at 6/19/13 4:59 PM:


It doesn't greatly affect other parts of Solr, it's not some big experimental 
change, so I intend to first commit to 5x and see how jenkins likes things and 
then backport to 4.x.

A lot of the core changes for this have slowly gone into 4.x long ago - 
including issues around making custom Directories first class in Solr and other 
little changes.

This builds to run against Apache Hadoop 2.0.5-alpha. I don't suspect that will 
be easily 'pluggable', but it will be easy enough to change the ivy files to 
point to another Hadoop distro, fix any compile time errors (if there are any), 
run the tests, and build Solr.

Because our dependency is on client code that talks to hdfs, I suspect that it 
will work fine as is with most distros based on the same version of Apache 
Hadoop - and probably other versions as well in many cases.


  was (Author: markrmil...@gmail.com):
It doesn't greatly affect other parts of Solr, it's not some big 
experimental change, so I intend to first commit to 5x and see how jenkins 
likes things and then backport to 4.x.

A lot of the core changes for this have slowly gone into 4.x long ago - 
including issues around making custom Directories first class in Solr and other 
little changes.

This builds to run against Apache Hadoop. I don't suspect that will be easily 
'pluggable', but it will be easy enough to change the ivy files to point to 
another Hadoop distro, fix any compile time errors (if there are any), run the 
tests, and build Solr.

Because our dependency is on client code that talks to hdfs, I suspect that it 
will work fine as is with most distros based on the same version of Apache 
Hadoop - and probably other versions as well in many cases.


  
 Add support to write and read Solr index files and transaction log files to 
 and from HDFS.
 --

 Key: SOLR-4916
 URL: https://issues.apache.org/jira/browse/SOLR-4916
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Assignee: Mark Miller
 Attachments: SOLR-4916.patch, SOLR-4916.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5006) Simplify / understand IndexWriter/DocumentsWriter synchronization

2013-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688159#comment-13688159
 ] 

Michael McCandless commented on LUCENE-5006:


+1, thanks Simon!

 Simplify / understand IndexWriter/DocumentsWriter synchronization
 -

 Key: LUCENE-5006
 URL: https://issues.apache.org/jira/browse/LUCENE-5006
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Attachments: LUCENE-5006.patch, LUCENE-5006.patch


 The concurrency in IW/DW/BD is terrifying: there are many locks involved, not 
 just intrinsic locks but IW also has fullFlushLock, commitLock, and there are 
 no clear rules about lock order to avoid deadlocks like LUCENE-5002.
 We have to somehow simplify this, and define the allowed concurrent behavior 
 eg when an app calls deleteAll while other threads are indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688162#comment-13688162
 ] 

Michael McCandless commented on LUCENE-4583:


I still think we should fix the limitation in core; this way apps that want to 
store large binary fields per-doc are able to use a custom DVFormat.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: solrj content-length header missing

2013-06-19 Thread Uwe Schindler
See:  http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ 
http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 6:59 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

Hi,

 

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library ( http://hc.apache.org/ 
http://hc.apache.org/).

 

What is the problem / error message of nginx?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.org
Subject: solrj content-length header missing

 

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?

Joe

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688163#comment-13688163
 ] 

Yonik Seeley commented on LUCENE-4583:
--

bq. I still think we should fix the limitation in core; this way apps that want 
to store large binary fields per-doc are able to use a custom DVFormat.

+1
arbitrary limits are not a feature.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: solrj content-length header missing

2013-06-19 Thread Payne, Joe
Thank you.  I will try that.

From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 19, 2013 1:07 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

See: http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.dehttp://www.thetaphi.de/
eMail: u...@thetaphi.demailto:u...@thetaphi.de

From: Payne, Joe [mailto:joe.pa...@kroger.com]
Sent: Wednesday, June 19, 2013 6:59 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: RE: solrj content-length header missing

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: RE: solrj content-length header missing

Hi,

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library (http://hc.apache.org/).

What is the problem / error message of nginx?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.dehttp://www.thetaphi.de/
eMail: u...@thetaphi.demailto:u...@thetaphi.de

From: Payne, Joe [mailto:joe.pa...@kroger.com]
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: solrj content-length header missing

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?
Joe



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.


RE: solrj content-length header missing

2013-06-19 Thread Uwe Schindler
Reading further, see the following statement:

http://wiki.nginx.org/NginxHttpChunkinModule

 

Status

This module is no longer needed for Nginx 1.3.9+ because since 1.3.9, the Nginx 
core already has built-in support for the chunked request bodies.

And this module is now only maintained for Nginx versions older than 1.3.9.

 

So you could install this module to make it work. The bug is on Nginx side, the 
older versions do not support chunked encoding which is *required* by the 
HTTP/1.1 spec! So clear usability failure.

 

Solr does not know body length without buffering, so cannot send length (see my 
mails before).

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 7:07 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

See:  http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ 
http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 6:59 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

Hi,

 

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library ( http://hc.apache.org/ 
http://hc.apache.org/).

 

What is the problem / error message of nginx?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.org
Subject: solrj content-length header missing

 

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?

Joe

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.



[jira] [Commented] (LUCENE-5066) TestFieldsReader fails in 4.x with OOM

2013-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688168#comment-13688168
 ] 

Michael McCandless commented on LUCENE-5066:


+1 patch looks good

Maybe we should pull out a public static final MAX_TERM_LENGTH_BYTES
in IndexWriter?  And DWPT references that, and this added assert in
TermBuffer.java uses it too?  Shai needed to use it recently as well...


 TestFieldsReader fails in 4.x with OOM
 --

 Key: LUCENE-5066
 URL: https://issues.apache.org/jira/browse/LUCENE-5066
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5066.patch


 Its FaultyIndexInput is broken (doesn't implement seek/clone correctly).
 This causes it to read bogus data and try to allocate an enormous byte[] for 
 a term.
 The bug was previously hidden:
 FaultyDirectory doesnt override openSlice, so CFS must not be used at flush 
 if you want to trigger the bug.
 FailtyIndexInput's clone is broken, it uses new but doesn't seek the clone 
 to the right place. This causes a disaster with BufferedIndexInput (which it 
 extends), because BufferedIndexInput (not just the delegate) must know its 
 position since it has seek-within-block etc code...
 It seems with this test (very simple one), that only 3.x codec triggers it 
 because its term dict relies upon clone()'s being seek'd to right place. 
 I'm not sure what other codecs rely upon this, but imo we should also add a 
 low-level test for directories that does something like this to ensure its 
 really tested:
 {code}
 dir.createOutput(x);
 dir.openInput(x);
 input.seek(somewhere);
 clone = input.clone();
 assertEquals(somewhere, clone.getFilePointer());
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

2013-06-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688179#comment-13688179
 ] 

Hoss Man edited comment on SOLR-4934 at 6/19/13 5:25 PM:
-

bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound 
files and the useCompoundFile setting under indexConfig that I found in the 
branch_4x example wasn't turning it off.

Oh ... hmmm, yeah ... i hadn't noticed that.  definitely a bug there.  I've 
opened SOLR-4941 to track that, and we'll leave this issue specifically about 
the broken initargs config option.

*EDIT:* fixed issue number

  was (Author: hossman):
bq. The only thing I knew was that LUCENE-5038 had caused Solr to make 
compound files and the useCompoundFile setting under indexConfig that I found 
in the branch_4x example wasn't turning it off.

Oh ... hmmm, yeah ... i hadn't noticed that.  definitely a bug there.  I've 
opened SOLR-4926 to track that, and we'll leave this issue specifically about 
the broken initargs config option.


  
 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-4941:
--

Assignee: Hoss Man

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man

 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

2013-06-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688179#comment-13688179
 ] 

Hoss Man commented on SOLR-4934:


bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound 
files and the useCompoundFile setting under indexConfig that I found in the 
branch_4x example wasn't turning it off.

Oh ... hmmm, yeah ... i hadn't noticed that.  definitely a bug there.  I've 
opened SOLR-4926 to track that, and we'll leave this issue specifically about 
the broken initargs config option.



 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)
Hoss Man created SOLR-4941:
--

 Summary: useCompoundFile default has changed, simple config option 
no longer seems to work
 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


Spin off of SOLR-4934.  We should updated tests to ensure that the various ways 
of specifying useCompoundFile as well as the expected default are working 
properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

2013-06-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-4934.


Resolution: Fixed

merged r1494348 - 4x as r1494696

 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688185#comment-13688185
 ] 

Michael McCandless commented on LUCENE-5030:


The easy performance tester to run is
lucene/suggest/src/test/org/apache/lucene/search/suggest/LookupBenchmarkTest.java
... we should test that first I think?  I can also run one based on
FreeDB ... the sources are in luceneutil
(https://code.google.com/a/apache-extras.org/p/luceneutil/ ).

If the perf hit is too much then one option would be to make it
optional (whether we count edits in Unicode space UTF-8 space), or
maybe just another suggester class (FuzzyUnicodeSuggester?).

I think we can use INFO_SEP: yes, this is used for PAYLOAD_SEP, but
that only means the incoming surfaceForm cannot contain this char, I
think?  So ... I think we are free to use it in the analyzed form?  Or
did something go wrong when you tried?

Whichever chars we use (steal), we should add checks that these chars do not
occur in the input...


 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688190#comment-13688190
 ] 

Uwe Schindler commented on SOLR-4934:
-

bq. Assuming there are plenty of file descriptors available, will a user get 
better performance from compound files or separate files?

Searching on the index will have no negative impact. IndexInputSlicer returns 
optimized indexinputs that remove the whole file offset stuff. Indexing speed 
is identical, too, but merging (done in background) is more expensive.

 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Subhash Karemore (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688201#comment-13688201
 ] 

Subhash Karemore commented on SOLR-4939:


Hi,

I think you are right. I am not too much familiar with linux environment.  
Could you please tell me exact command for allowing TCP connection so that I 
should able to connect to remote oracle DB using java. I searched lot for this 
problem, however I didn't find the exact command/solution.

I appreciate your help.

Regards,
Subhash



 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at 

[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688217#comment-13688217
 ] 

Uwe Schindler commented on SOLR-4939:
-

Ask your firewall administrator, we have no idea about your environment and 
cannot help!

A quick test if it works at all is to enter the following on shell (needs 
netcat installed):

{code}
nc hostname_of_oracle_server 2126
{code}

If this also timeouts, ask somebody who knows your network.

 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
 at 
 

[jira] [Commented] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688270#comment-13688270
 ] 

Hoss Man commented on SOLR-4941:


I understand what happend now..

when simon asked on the mailing list for help reviewing the solr changes 
affected by  LUCENE-5038 i didn't fully understand the scope of the change, and 
only focused on how it affected the existing MergePolicy settings (SOLR-4934) 
-- but i only noticed that setUseCompoundFile had been removed from the merge 
policies in facvor us only using the ratio -- i didn't realize that 
setUseCompoundFile was actaully moved to IndexWriterConfig.

i'll work up a patch to make the existing solr settings apply to the 
IndexWriterConfig.

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man

 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter commit user data takes a map

2013-06-19 Thread Varun Thacker
Hi Steve,

Thanks for pointing it out.

I was actually looking at SOLR-2701 when I thought about why have a Map
instead of a string identifier.

So I'm guessing this should be left untouched?



On Wed, Jun 19, 2013 at 7:55 PM, Steve Rowe sar...@gmail.com wrote:

 Hi Varun,

 LUCENE-4575 did not change IW's user data to a Map.  That was done in
 LUCENE-1654.

 Steve

 On Jun 19, 2013, at 6:57 AM, Varun Thacker varunthacker1...@gmail.com
 wrote:

  I was just curious as to why IW.setCommitData uses a map ?
 
  Looking back at LUCENE-1382 when committing user data was introduced it
 took a string.
 
  In LUCENE-4575 it was refactored and changed to a Map. From the comments
 I couldn't really figure out why was it changed.
 
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 


Regards,
Varun Thacker
http://www.vthacker.in/


[jira] [Commented] (SOLR-1301) Solr + Hadoop

2013-06-19 Thread Alexander Kanarsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688330#comment-13688330
 ] 

Alexander Kanarsky commented on SOLR-1301:
--

[~otis], do you mean to use the Solr query result as an MapReduce job input?

 Solr + Hadoop
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki 
 Fix For: 4.4

 Attachments: commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
 hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
 log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, 
 SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SolrRecordWriter.java


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1301) Solr + Hadoop

2013-06-19 Thread Alexander Kanarsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688330#comment-13688330
 ] 

Alexander Kanarsky edited comment on SOLR-1301 at 6/19/13 7:17 PM:
---

[~otis], do you mean to use the Solr query result as an MapReduce job input?
Also, regarding the SOLR-1045, it is a different approach (in Map phase vs. 
Reduce phase- great explanation by Ted is up here: 
https://issues.apache.org/jira/browse/SOLR-1301#comment-12828961)

  was (Author: kanarsky):
[~otis], do you mean to use the Solr query result as an MapReduce job input?
  
 Solr + Hadoop
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki 
 Fix For: 4.4

 Attachments: commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
 hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
 log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, 
 SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SolrRecordWriter.java


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5069:
--

 Summary: Can/should we store NumericField's precisionStep in the 
index?
 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless


I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
failing to hit the expected docs ... and it was because s/he had indexed with 
precStep=4 but searched with precStep=1.

Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe 
store precStep in FieldInfo, and then fail at search time if you use a 
non-matching precStep?

I think you can index fine and then search on a multiple of that?  E.g., I can 
index with precStep=2 but search with precStep=8?  But indexing with precStep=4 
and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339
 ] 

Mark Miller commented on SOLR-4926:
---

bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with 
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs 
to replicate.

16 documents end up on the relica after the replication - 110 short.

The leader is on gen 3, the replica on gen 1.

Perhaps a red herring, but in the many cases of this I've looked at, oddly, no 
buffered docs are ever replayed after that - though I have seen buffered docs 
replayed in those same runs when the replication did not fail. Weird 
observation.

Anyway, I need to turn on more replication level logging I think.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339
 ] 

Mark Miller edited comment on SOLR-4926 at 6/19/13 7:23 PM:


bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with 
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs 
to replicate.

16 documents end up on the relica after the replication - 110 short.

Before the replication, the leader is on gen 3, the replica on gen 1.

Perhaps a red herring, but in the many cases of this I've looked at, oddly, no 
buffered docs are ever replayed after that - though I have seen buffered docs 
replayed in those same runs when the replication did not fail. Weird 
observation.

Anyway, I need to turn on more replication level logging I think.

  was (Author: markrmil...@gmail.com):
bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with 
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs 
to replicate.

16 documents end up on the relica after the replication - 110 short.

The leader is on gen 3, the replica on gen 1.

Perhaps a red herring, but in the many cases of this I've looked at, oddly, no 
buffered docs are ever replayed after that - though I have seen buffered docs 
replayed in those same runs when the replication did not fail. Weird 
observation.

Anyway, I need to turn on more replication level logging I think.
  
 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688346#comment-13688346
 ] 

Uwe Schindler commented on LUCENE-5069:
---

I think we can do this.  I had the same in mind, but lots of people were 
against for schema reasons (you know, no schema info in index). If we save 
precision step we should also save type like we do for stored fields.

The search works with multiple of original precision step is correct, btw

While indexing, adding a new item with different step should also fail.  The 
check on indexing show would be done in the TermsEnum initialization of mtq's 
getTermsEnum().

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688346#comment-13688346
 ] 

Uwe Schindler edited comment on LUCENE-5069 at 6/19/13 7:30 PM:


I think we can do this.  I had the same in mind, but lots of people were 
against for schema reasons (you know, no schema info in index). If we save 
precision step we should also save type like we do for stored fields.

The search works with multiple of original precision step is correct, btw

While indexing, adding a new item with different step should also fail.  The 
check on searching would be done in the TermsEnum initialization of mtq's 
getTermsEnum().

  was (Author: thetaphi):
I think we can do this.  I had the same in mind, but lots of people were 
against for schema reasons (you know, no schema info in index). If we save 
precision step we should also save type like we do for stored fields.

The search works with multiple of original precision step is correct, btw

While indexing, adding a new item with different step should also fail.  The 
check on indexing show would be done in the TermsEnum initialization of mtq's 
getTermsEnum().
  
 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Estimating Solr memory requirements

2013-06-19 Thread Dmitry Kan
Hi Erick,

Is typo in the title on purpose?


On 19 June 2013 15:09, Erick Erickson erickerick...@gmail.com wrote:

 OK, I seem to have stalled on this. Over part of the winter, I put
 together a Swing-based program to help estimate Solr/Lucene memory
 requirements, with all the usual caveats see:
 https://github.com/ErickErickson/SolrMemoryEsitmator.

 I have notes to myself that it's still deficient in several areas:
 FieldValueCache estimates
 tlog requirements
 Memory required to re-open a searcher
 Position and term vector memory requirements
 And whatever I haven't thought about yet.

 Of course it builds on Grant's spreadsheet (reads steals from it
 shamelessly!) I'm hoping to have a friendlier interface. And _of
 course_ I'd be willing to donate it to Solr as a util/contrib/whatever
 if it fits.

 So, what I'm about here is a few things:

  Anyone who wants to try it feel free. The build instructions are at the
 above, but the short form is to clone it, ant jar and java -jar
 dist/estimator.jar. Enter some field info and hit the Add/Save button
 then hit the Dump calcs button to see what it does currently.

 It also saves the estimates away in a file and shows all the steps it
 goes through to perform the calculations. It'll also make rudimentary
 field definitions from the entered data. You can come back to it later
 and add to what you've already done.

  Make any improvements you see fit, particular to flesh out the
 deficiencies listed above.

  Anyone who has, you know, graphic design/Swing skills please feel free
 to make it better. I'm a newbie as far as using Swing is concerned, and the
 way I align buttons and checkboxes is pretty hacky. But it works

  Any suggestions anyone wants to make. Suggestions in code are nicest of
 course, but algorithms for calculating, say, position and tv memory usage
 would be great as well! Isolated code snippets that I could incorporate
 would be great too.

  Any info where I've gotten the calculations wrong or don't show enough
 info to actually figure out whether they're correct or not.

 Note that the goal for this is to give a rough idea of memory
 requirements and be easy to use. The spreadsheet is a bit daunting to
 someone who knows nothing about Solr so this might be an easier way to
 get into it.

 Thanks,
 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688355#comment-13688355
 ] 

Adrien Grand commented on LUCENE-5069:
--

bq. While indexing, adding a new item with different step should also fail.

+1 This motivation is enough to me to store the precision step in the field 
info.

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688359#comment-13688359
 ] 

Uwe Schindler commented on LUCENE-5069:
---

With this info in FieldInfo we could automatically select the right precision 
step for each atomic reader processed while the query runs. 

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4941:
---

Attachment: infostream.txt
SOLR-4941.patch

Patch that improves the tests and updates the logic added in SOLR-4934 so that 
if there is explicit useCompoundFile configuration as an init arg for a (known) 
MergePolicy we pass that to the IndexWriterConfig's setUseCompoundFile method 
and log a warning instead of just ignoring it.

patch also removes the warnings about the simple legacy useCompoundFile 
syntax since that actually makes sense now that it's a setting on IWC.

I've also updated the tests to inspect the useCompoundFile on the IWC as well 
as checking the results of adding some segments.

there is still a failure in testTieredMergePolicyConfig where (as i understand 
it from talking to mike on IRC) the merged segment after the optimize command 
should *not* be in CFS format because of the noCFSRatio setting -- but the 
merged segment is still in CFS. i've attached the infostream log from running 
ant test -Dtestcase=TestMergePolicyConfig 
-Dtests.method=testTieredMergePolicyConfig to see if it helps illuminate the 
problem ... i suspect it's either a test bug because i still missunderstand 
something about how the MergePolicy settings come into play, or a genuine bug 
in the lower level TieredMP code -- i don't see how it could be specific to the 
solr config parsing logic since the IWC and TMP getters say they got the 
expected settings.

(NOTE: the patch includes a nocommit in solrconfig-mergepolicy.xml to turn off 
the infostream before committing)

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: infostream.txt, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688393#comment-13688393
 ] 

Robert Muir commented on LUCENE-5069:
-

{quote}
 I had the same in mind, but lots of people were against for schema reasons 
(you know, no schema info in index). If we save precision step we should also 
save type like we do for stored fields.
{quote}

Count me as one of those: I'm worried how the issue has already jumped to this.


 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3838) Admin UI - Multiple filter queries are not supported in Query UI

2013-06-19 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3838:


Attachment: SOLR-3838.patch

Updated Patch, includes the focus on last possible row after deletion-change.

will commit that shortly

 Admin UI - Multiple filter queries are not supported in Query UI
 

 Key: SOLR-3838
 URL: https://issues.apache.org/jira/browse/SOLR-3838
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
Assignee: Stefan Matheis (steffkes)
 Fix For: 5.0, 4.4

 Attachments: screenshot-1.jpg, SOLR-3838.patch, SOLR-3838.patch, 
 SOLR-3838.patch, SOLR-3838.patch


 The Solr Admin Query UI has only a single fq input field, which does not 
 permit the user to enter multiple filter query parameters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688405#comment-13688405
 ] 

Robert Muir commented on LUCENE-5069:
-

{quote}
With this info in FieldInfo we could automatically select the right precision 
step for each atomic reader processed while the query runs. 
{quote}

The problem is its too late: QueryParser/Query are independent of readers: so 
they dont know to generate the correct query (e.g. NumericRangeQuery instead of 
TermRangeQuery) in the first place!

So this issue misses the forest for the trees, sorry, -1 to a halfass schema 
that brings all of the problems of a schema and none of the benefits!

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3838) Admin UI - Multiple filter queries are not supported in Query UI

2013-06-19 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-3838.
-

Resolution: Implemented

Committed in ..
trunk: r1494762
branch_4x: r1494763

 Admin UI - Multiple filter queries are not supported in Query UI
 

 Key: SOLR-3838
 URL: https://issues.apache.org/jira/browse/SOLR-3838
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
Assignee: Stefan Matheis (steffkes)
 Fix For: 5.0, 4.4

 Attachments: screenshot-1.jpg, SOLR-3838.patch, SOLR-3838.patch, 
 SOLR-3838.patch, SOLR-3838.patch


 The Solr Admin Query UI has only a single fq input field, which does not 
 permit the user to enter multiple filter query parameters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4456) Admin UI: Displays dashboard even if Solr is down

2013-06-19 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-4456.
-

   Resolution: Fixed
Fix Version/s: 5.0

committed the current state in
trunk r1494765
branch_4x r1494768

if there a suggestions for tweaking take, please open a new ticket for that

 Admin UI: Displays dashboard even if Solr is down
 -

 Key: SOLR-4456
 URL: https://issues.apache.org/jira/browse/SOLR-4456
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.1
Reporter: Jan Høydahl
Assignee: Stefan Matheis (steffkes)
 Fix For: 5.0, 4.4

 Attachments: SOLR-4456.patch, SOLR-4456.patch, SOLR-4456.patch


 1. Run Solr and bruing up the Admin dashboard
 2. Stop Solr
 3. Click around the Admin GUI. It apparently works, but displays a spinning 
 wheel for most panels
 4. Click on Dashboard. An old cached dashboard is displayed
 What should happen is that once connection to Solr is lost, the whole Admin 
 UI displays a large red box CONNECTION LOST or something :) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-4583:



 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688440#comment-13688440
 ] 

Mark Miller commented on SOLR-4926:
---

Reviewing some more sample fails of RecoveryZkTest:

It actually looks like after the replication we end up with one commit point 
back - eg we are trying to replicate gen 3 and replica moves from gen 1 to gen 
2.

- Mark

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688444#comment-13688444
 ] 

Michael McCandless commented on SOLR-4941:
--

Indeed I can see that TMP has noCFSRatio=0.6, and two segments are flushed  
turned into CFS, then those two segments are merged, and then the merged 
segment is turned into a CFS.

I think this means that the merged segment's files (pre-CFS) are  0.6 the size 
of the two flushed CFS segments ... e.g. maybe the CFS headers of the first 2 
segments are tipping the scale?  Try indexing more docs for each segment maybe?

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: infostream.txt, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4719) Admin UI - Default to wt=json on Query-Screen

2013-06-19 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-4719.
-

   Resolution: Implemented
Fix Version/s: 5.0

committed in 
trunk r1494772
branch_4x r1494774

 Admin UI - Default to wt=json on Query-Screen
 -

 Key: SOLR-4719
 URL: https://issues.apache.org/jira/browse/SOLR-4719
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 5.0, 4.4


 I didn't really notice that we're still using {{wt=xml}} as default on the 
 Query-Screen .. i suggest we change that to {{wt=json}} .. it's 2013 =)
 Syntax-Highlight would still work, even if one tries the 
 example-configuration where the content-type is overwritten with text/plain, 
 since it's based on the selection on the left side :)
 Any objections?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3546) Add index page to Admin UI

2013-06-19 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-3546.
-

Resolution: Duplicate
  Assignee: Stefan Matheis (steffkes)

 Add index page to Admin UI
 --

 Key: SOLR-3546
 URL: https://issues.apache.org/jira/browse/SOLR-3546
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Lance Norskog
Assignee: Stefan Matheis (steffkes)
Priority: Minor

 It would be great to index a file by uploading it. In designing schemas and 
 testing features I often make one or two test documents. It would be great to 
 upload these directly from the UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2440) Schema Browser more user friendly

2013-06-19 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688456#comment-13688456
 ] 

Stefan Matheis (steffkes) commented on SOLR-2440:
-

[~jcodina] WDYT? If it's covered i'm going to close this one

 Schema Browser more user friendly
 -

 Key: SOLR-2440
 URL: https://issues.apache.org/jira/browse/SOLR-2440
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 1.4.1
 Environment: The schema browser of the admin web application
Reporter: Joan Codina
Priority: Minor
  Labels: browser, schema
 Fix For: 4.4

 Attachments: LUCENE_4_schema_jsp.patch, LUCENE_4_screen_css.patch, 
 schema_jsp.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The schema browser has some drawbacks
 * Does not sort the fields (the actual sorting seems arbritrary)
 * Capitalises all field names. Making difficult the match
 * Does not allow a drill down
 This small patch solves the three issues: 
 #  Changes the Css to do not capitalise the links
 #  Sorts the field names
 #  It replaces the tokens by links to a search query with that token
 that's all  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Estimating Solr memory requirements

2013-06-19 Thread Erick Erickson
Nope, never even noticed it until now. That's the right URL though,
typo and all

Someday I may even fix it G...

Thanks,
Erick

On Wed, Jun 19, 2013 at 3:35 PM, Dmitry Kan dmitry.luc...@gmail.com wrote:
 Hi Erick,

 Is typo in the title on purpose?


 On 19 June 2013 15:09, Erick Erickson erickerick...@gmail.com wrote:

 OK, I seem to have stalled on this. Over part of the winter, I put
 together a Swing-based program to help estimate Solr/Lucene memory
 requirements, with all the usual caveats see:
 https://github.com/ErickErickson/SolrMemoryEsitmator.

 I have notes to myself that it's still deficient in several areas:
 FieldValueCache estimates
 tlog requirements
 Memory required to re-open a searcher
 Position and term vector memory requirements
 And whatever I haven't thought about yet.

 Of course it builds on Grant's spreadsheet (reads steals from it
 shamelessly!) I'm hoping to have a friendlier interface. And _of
 course_ I'd be willing to donate it to Solr as a util/contrib/whatever
 if it fits.

 So, what I'm about here is a few things:

  Anyone who wants to try it feel free. The build instructions are at the
  above, but the short form is to clone it, ant jar and java -jar
  dist/estimator.jar. Enter some field info and hit the Add/Save button
  then hit the Dump calcs button to see what it does currently.

 It also saves the estimates away in a file and shows all the steps it
 goes through to perform the calculations. It'll also make rudimentary
 field definitions from the entered data. You can come back to it later
 and add to what you've already done.

  Make any improvements you see fit, particular to flesh out the
  deficiencies listed above.

  Anyone who has, you know, graphic design/Swing skills please feel free
  to make it better. I'm a newbie as far as using Swing is concerned, and the
  way I align buttons and checkboxes is pretty hacky. But it works

  Any suggestions anyone wants to make. Suggestions in code are nicest of
  course, but algorithms for calculating, say, position and tv memory usage
  would be great as well! Isolated code snippets that I could incorporate
  would be great too.

  Any info where I've gotten the calculations wrong or don't show enough
  info to actually figure out whether they're correct or not.

 Note that the goal for this is to give a rough idea of memory
 requirements and be easy to use. The spreadsheet is a bit daunting to
 someone who knows nothing about Solr so this might be an easier way to
 get into it.

 Thanks,
 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688522#comment-13688522
 ] 

Mark Miller commented on SOLR-4926:
---

In the case where the slave is on gen 2, it did just download the files for gen 
3 - so it seems we are not picking up the latest commit point somehow..

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5066) TestFieldsReader fails in 4.x with OOM

2013-06-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688532#comment-13688532
 ] 

Robert Muir commented on LUCENE-5066:
-

I mentioned this in the email: should we do it here under this issue?

re above: I think we should spin off an issue to improve the codec checks (so 
we get assert fails at least, rather than OOM), i imagine this would be part of 
that issue, but can do it here too.

 TestFieldsReader fails in 4.x with OOM
 --

 Key: LUCENE-5066
 URL: https://issues.apache.org/jira/browse/LUCENE-5066
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5066.patch


 Its FaultyIndexInput is broken (doesn't implement seek/clone correctly).
 This causes it to read bogus data and try to allocate an enormous byte[] for 
 a term.
 The bug was previously hidden:
 FaultyDirectory doesnt override openSlice, so CFS must not be used at flush 
 if you want to trigger the bug.
 FailtyIndexInput's clone is broken, it uses new but doesn't seek the clone 
 to the right place. This causes a disaster with BufferedIndexInput (which it 
 extends), because BufferedIndexInput (not just the delegate) must know its 
 position since it has seek-within-block etc code...
 It seems with this test (very simple one), that only 3.x codec triggers it 
 because its term dict relies upon clone()'s being seek'd to right place. 
 I'm not sure what other codecs rely upon this, but imo we should also add a 
 low-level test for directories that does something like this to ensure its 
 really tested:
 {code}
 dir.createOutput(x);
 dir.openInput(x);
 input.seek(somewhere);
 clone = input.clone();
 assertEquals(somewhere, clone.getFilePointer());
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4941:
---

Attachment: SOLR-4941.patch

bq. maybe the CFS headers of the first 2 segments are tipping the scale? Try 
indexing more docs for each segment maybe?

yeah .. i guess i was just naive in considering 0.6 a low enough threshold.

i increased the size of the docs and the number of docs per segment -- and when 
that still didn't work i also decreased the ratio to 0.1 and that seemed to do 
the trick.

updated patch fixes the test, removes the nocommit, and updates the upgrading 
instructions in CHANGES.txt (still need an explicit Bug Fix entry though)

still running more test iters, but i think this is pretty good.

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4941:
---

Fix Version/s: 4.4
   5.0

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4

 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688589#comment-13688589
 ] 

Adriano Crestani commented on LUCENE-5069:
--

Couldn't the standard flexible query parser be used for that? I know you can 
configure numeric fields in it before parsing a query. I think there is a wiki 
about it, just can't find it, maybe Uwe remembers where it is. For now you can 
take a look at TestNumericQueryParser.

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4618) Integrate LucidWorks' Solr Reference Guide with Solr documentation

2013-06-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688623#comment-13688623
 ] 

Hoss Man commented on SOLR-4618:


FYI: Things have kind of been in a holding pattern for a while now ... first i 
was waiting for some confirmation from Infra to proceed, then Gavin in Infra 
said he wanted to do a full backup first and be online during the import, then 
after playing jira  irc message tag for a bit (Gavin and i are in diametricly 
opposed timezones) Infra announced that they are upgrading CWIKI to Confluence 
5.x.

I _think_ the current plan is to import the data into the current wiki sometime 
in the next day or so before the upgrade, but it may happen as part of the 
ugprade, or perhaps after the upgrade ... i really don't know.

 Integrate LucidWorks' Solr Reference Guide with Solr documentation
 --

 Key: SOLR-4618
 URL: https://issues.apache.org/jira/browse/SOLR-4618
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.1
Reporter: Cassandra Targett
Assignee: Hoss Man
 Attachments: NewSolrStyle.css, SolrRefGuide4.1-ASF.zip, 
 SolrRefGuide.4.3.zip


 LucidWorks would like to donate the Apache Solr Reference Guide, maintained 
 by LucidWorks tech writers, to the Solr community. It was first produced in 
 2009 as a download-only PDF for Solr 1.4, but since 2011 it has been online 
 at http://docs.lucidworks.com/display/solr/ and updated for Solr 3.x releases 
 and for Solr 4.0 and 4.1.
 I've prepared an XML export from our Confluence installation, which can be 
 easily imported into the Apache Confluence installation by someone with 
 system admin rights. The doc has not yet been updated for 4.2, so it covers 
 Solr 4.1 so far. I'll add some additional technical notes about the export 
 itself in a comment. 
 Since we use Confluence at LucidWorks, I can also offer assistance getting 
 Confluence set up, importing this package into it, or any other help needed 
 for the community to start using this. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688631#comment-13688631
 ] 

Robert Muir commented on LUCENE-5069:
-

Sure but then you basically have 2 schemas :)

Alternatively we could argue numericrangequery is something that a QP should 
never generate anyway: instead maybe QP's should only worry about user intent 
and generate RangeQuery, which rewrite()s to the correct type...

My point is we should just think these things thru without introducing 
additional schema-like things into lucene, since we already have enough of them 
(Analyzer configuration for example, is a form of schema, maintained by the 
user).

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #885: POMs out of sync

2013-06-19 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/885/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 is not consistent.  Got 305 from 
http://127.0.0.1:64102/g_d/x/collection1lastClient and got 253 from 
http://127.0.0.1:63228/g_d/x/collection1

Stack Trace:
java.lang.AssertionError: shard1 is not consistent.  Got 305 from 
http://127.0.0.1:64102/g_d/x/collection1lastClient and got 253 from 
http://127.0.0.1:63228/g_d/x/collection1
at 
__randomizedtesting.SeedInfo.seed([201755EC8EA7E3B9:A1F1DBF4F9F88385]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1018)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:238)




Build Log:
[...truncated 23632 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

2013-06-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-4941.


Resolution: Fixed

Committed revision 1494837.
Committed revision 1494839.


 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4

 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4942) Add more randomized testing of compound file format and random merge policies

2013-06-19 Thread Hoss Man (JIRA)
Hoss Man created SOLR-4942:
--

 Summary: Add more randomized testing of compound file format and 
random merge policies
 Key: SOLR-4942
 URL: https://issues.apache.org/jira/browse/SOLR-4942
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man


SOLR-4926 seems to have uncovered some sporadic cloud/replication bugs related 
to using compound files.

We should updated SolrTestCaseJ4 and the majority of our test configs to better 
randomize the usage of compound files and merge policies.

Step #1...

* update test configs to use 
{{useCompoundFile${useCompoundFile:false}/useCompoundFile}}
* update SolrTestCaseJ4 to toggle that sys property randomly

Step #2...

* add a new RandomMergePolicy that implements MergePolicy by proxying to 
another instance selected at creation using one of the 
LuceneTestCase.new...MergePolicy methods
* updated test configs to refer to this new MergePolicy
* borrow the tests.shardhandler.randomSeed logic in SolrTestCaseJ4 to give 
our RandomMergePolicy a consistent seed at runtime.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688807#comment-13688807
 ] 

Mark Miller commented on SOLR-4926:
---

I've been focusing on the RecoveryZkTest case.

Every fail I've looked at has used the RAM dir. Odd because the safe leader 
test that fails is hard coded to not use ramdir I think. RecoveryZkTest also 
uses mock dir, but I don't think the safe leader test does because of the hard 
coding to standard dir.

Anyway, more on what I'm seeing from the RecoveryZkTest fails:

we replicate gen 3 files, we reopen the writer and then the searcher using that 
writer - we get an index of gen 2 - the files from the searcher's directory 
don't contain the newly replicated files, just the gen 2 index files.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org