[jira] [Resolved] (SOLR-12327) 'bin/solr status' should be able to produce output as pure JSON
[ https://issues.apache.org/jira/browse/SOLR-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal resolved SOLR-12327. Resolution: Not A Problem Didn't realize that *curl 'http://localhost:8983/solr/admin/info/system'* will return pure JON with that information > 'bin/solr status' should be able to produce output as pure JSON > --- > > Key: SOLR-12327 > URL: https://issues.apache.org/jira/browse/SOLR-12327 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: 7.2 >Reporter: Simon Rosenthal >Priority: Minor > > The 'bin/solr status' command should optionally (so as not to break back > compat) produce its outputs in pure JSON, for easier parsing, rather than a > mixture of free text and JSON as it does at present. > e.g. > {{prompt# bin/solr status purejson}} > {{ {}} > {{# these two lines replace "Solr process on port "}} > {{ *"solr_port":"8983",*}} > {{ *"solr_pid":"14020",*}} > {{ "solr_home":"/home/user/solr-7.2.1/server/solr",}} > {{ "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - > 2018-01-10 00:54:21",}} > {{ "startTime":"2018-05-02T13:51:43.388Z",}} > {{ "uptime":"5 days, 2 hours, 39 minutes, 35 seconds",}} > {{ "memory":"1.2 GB (%25.7) of 4.8 GB",}} > {{ "cloud":{}} > {{ "ZooKeeper":"localhost:9983",}} > {{ "liveNodes":"1",}} > {{ "collections":"1" > The use case here is mapping a solr port (where that is the only available > information about the Solr instance) to ZK host/port(s) for a subsequent call > to zkcli.sh. > > {{ }} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12327) 'bin/solr status' should be able to produce output as pure JSON
[ https://issues.apache.org/jira/browse/SOLR-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466234#comment-16466234 ] Simon Rosenthal edited comment on SOLR-12327 at 5/7/18 5:39 PM: Didn't realize that *curl 'http://localhost:8983/solr/admin/info/system'* will return pure JSON with that information was (Author: simon.rosenthal): Didn't realize that *curl 'http://localhost:8983/solr/admin/info/system'* will return pure JON with that information > 'bin/solr status' should be able to produce output as pure JSON > --- > > Key: SOLR-12327 > URL: https://issues.apache.org/jira/browse/SOLR-12327 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Affects Versions: 7.2 >Reporter: Simon Rosenthal >Priority: Minor > > The 'bin/solr status' command should optionally (so as not to break back > compat) produce its outputs in pure JSON, for easier parsing, rather than a > mixture of free text and JSON as it does at present. > e.g. > {{prompt# bin/solr status purejson}} > {{ {}} > {{# these two lines replace "Solr process on port "}} > {{ *"solr_port":"8983",*}} > {{ *"solr_pid":"14020",*}} > {{ "solr_home":"/home/user/solr-7.2.1/server/solr",}} > {{ "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - > 2018-01-10 00:54:21",}} > {{ "startTime":"2018-05-02T13:51:43.388Z",}} > {{ "uptime":"5 days, 2 hours, 39 minutes, 35 seconds",}} > {{ "memory":"1.2 GB (%25.7) of 4.8 GB",}} > {{ "cloud":{}} > {{ "ZooKeeper":"localhost:9983",}} > {{ "liveNodes":"1",}} > {{ "collections":"1" > The use case here is mapping a solr port (where that is the only available > information about the Solr instance) to ZK host/port(s) for a subsequent call > to zkcli.sh. > > {{ }} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12327) 'bin/solr status' should be able to produce output as pure JSON
Simon Rosenthal created SOLR-12327: -- Summary: 'bin/solr status' should be able to produce output as pure JSON Key: SOLR-12327 URL: https://issues.apache.org/jira/browse/SOLR-12327 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: scripts and tools Affects Versions: 7.2 Reporter: Simon Rosenthal The 'bin/solr status' command should optionally (so as not to break back compat) produce its outputs in pure JSON, for easier parsing, rather than a mixture of free text and JSON as it does at present. e.g. {{prompt# bin/solr status purejson}} {{ {}} {{# these two lines replace "Solr process on port "}} {{ *"solr_port":"8983",*}} {{ *"solr_pid":"14020",*}} {{ "solr_home":"/home/user/solr-7.2.1/server/solr",}} {{ "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:54:21",}} {{ "startTime":"2018-05-02T13:51:43.388Z",}} {{ "uptime":"5 days, 2 hours, 39 minutes, 35 seconds",}} {{ "memory":"1.2 GB (%25.7) of 4.8 GB",}} {{ "cloud":{}} {{ "ZooKeeper":"localhost:9983",}} {{ "liveNodes":"1",}} {{ "collections":"1" The use case here is mapping a solr port (where that is the only available information about the Solr instance) to ZK host/port(s) for a subsequent call to zkcli.sh. {{ }} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8767) RealTimeGetComponent and stored/copyField exclusion
[ https://issues.apache.org/jira/browse/SOLR-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461800#comment-16461800 ] Simon Rosenthal commented on SOLR-8767: --- +1 on changing the behavior. Was just bitten by this real-time get behavior in a situation where I was using a copyfield to effectively rename a -->b (a was then defined as non-indexed, non-stored). Not the intended use of copyfield, I know, and I could probably use a FieldNameMutatingUpdateProcessor instead (though this results in including field name manipulations in solrconfig.xml, where it really doesn't belong, rather than in the schema). > RealTimeGetComponent and stored/copyField exclusion > --- > > Key: SOLR-8767 > URL: https://issues.apache.org/jira/browse/SOLR-8767 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Erik Hatcher >Priority: Critical > > Consider this scenario: schema has fields `a` and `b` defined, both stored. > A copyField is defined from `a` => `b`. A document is indexed with `id=1; > b="foo"`. A real-time /get will not return field `b` because > RealTimeGetComponent.toSolrDoc currently excludes copyField destinations > (despite, in this situation, the source of that copyField not being sent in). > Granted this is a bit of a diabolical case (discovered while tinkering with > cloud MLT tests), but isn't that far fetched to happen in the wild. > Maybe real-time /get should return all fields set as stored, regardless of > copyField status? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10840) Random Index Corruption during bulk indexing
[ https://issues.apache.org/jira/browse/SOLR-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal resolved SOLR-10840. Resolution: Cannot Reproduce After moving our production Solr server to a new AWS instance, the problem disappeared. Heaven knows why. > Random Index Corruption during bulk indexing > > > Key: SOLR-10840 > URL: https://issues.apache.org/jira/browse/SOLR-10840 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Affects Versions: 6.3, 6.5.1 > Environment: AWS EC2 instance running Centos 7 >Reporter: Simon Rosenthal > > I'm seeing a randomly occuring Index Corruption exception during a Solr data > ingest. This can occur anywhere during the 7-8 hours our ingests take. I'm > initially submitting this as a Solr bug as this is the envioronment I'm > using, but it does look as though the error is occurring in Lucene code. > Some background: > AWS EC2 server running CentOS 7 > java.runtime.version: 1.8.0_131-b11 (also occurred with 1.8.0_45). > Solr 6.3.0 (have also seen it with Solr 6.5.1). It did not happen with > Solr 5.4 9which i can't go back to). Oddly enough, I ran Solr 6.3.0 > unvenetfully for several weeks before this problem first occurred. > Standalone (non cloud) environment. > Our indexing subsystem is a complex Python script which creates multiple > indexing subprocesses in order to make use of multiple cores. Each subprocess > reads records from a MySQL database, does some significant preprocessing and > sends a batch of documents (defaults to 500) to the Solr update handler > (using the Python 'scorched' module). Each content source (there are 5-6) > requires a separate instantiation of the script, and these wrapped in a Bash > script to run serially. > > When the exception occurs, we always see something like the following in > the solr.log > > ERROR - 2017-06-06 14:37:34.639; [ x:stresstest1] > org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: > Exception writing document id med-27840-00384802 to the index; possible > analysis error. > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178 > ... > Caused by: org.apache.lucene.store.AlreadyClosedException: this > IndexWriter is closed > at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740) > at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558) > at > org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:279) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) > ... 42 more > Caused by: java.io.EOFException: read past EOF: > MMapIndexInput(path="/indexes/solrindexes/stresstest1/index/_441.nvm") > at > org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75) > at > org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41) > at org.apache.lucene.store.DataInput.readInt(DataInput.java:101) > at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194) > at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) > at > org.apache.lucene.codecs.lucene53.Lucene53NormsProducer.(Lucene53NormsProducer.java:58) > at > org.apache.lucene.codecs.lucene53.Lucene53NormsFormat.normsProducer(Lucene53NormsFormat.java:82) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:113) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74) > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145) > at > org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384) > at > org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416) > at > org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261) > at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4068) > at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4026) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3880) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) > Suppressed: org.apache.lucene.index.CorruptIndexException: checksum >
[jira] [Created] (SOLR-10840) Random Index Corruption during bulk indexing
Simon Rosenthal created SOLR-10840: -- Summary: Random Index Corruption during bulk indexing Key: SOLR-10840 URL: https://issues.apache.org/jira/browse/SOLR-10840 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: update Affects Versions: 6.5.1, 6.3 Environment: AWS EC2 instance running Centos 7 Reporter: Simon Rosenthal I'm seeing a randomly occuring Index Corruption exception during a Solr data ingest. This can occur anywhere during the 7-8 hours our ingests take. I'm initially submitting this as a Solr bug as this is the envioronment I'm using, but it does look as though the error is occurring in Lucene code. Some background: AWS EC2 server running CentOS 7 java.runtime.version: 1.8.0_131-b11 (also occurred with 1.8.0_45). Solr 6.3.0 (have also seen it with Solr 6.5.1). It did not happen with Solr 5.4 9which i can't go back to). Oddly enough, I ran Solr 6.3.0 unvenetfully for several weeks before this problem first occurred. Standalone (non cloud) environment. Our indexing subsystem is a complex Python script which creates multiple indexing subprocesses in order to make use of multiple cores. Each subprocess reads records from a MySQL database, does some significant preprocessing and sends a batch of documents (defaults to 500) to the Solr update handler (using the Python 'scorched' module). Each content source (there are 5-6) requires a separate instantiation of the script, and these wrapped in a Bash script to run serially. When the exception occurs, we always see something like the following in the solr.log ERROR - 2017-06-06 14:37:34.639; [ x:stresstest1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id med-27840-00384802 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178 ... Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:279) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) ... 42 more Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/indexes/solrindexes/stresstest1/index/_441.nvm") at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75) at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41) at org.apache.lucene.store.DataInput.readInt(DataInput.java:101) at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194) at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) at org.apache.lucene.codecs.lucene53.Lucene53NormsProducer.(Lucene53NormsProducer.java:58) at org.apache.lucene.codecs.lucene53.Lucene53NormsFormat.normsProducer(Lucene53NormsFormat.java:82) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:113) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145) at org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384) at org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416) at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261) at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4068) at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4026) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3880) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/indexes/solrindexes/stresstest1/index/_441.nvm"))) at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:451) at org.apache.lucene.codecs.lucene53.Lucene53NormsProducer.(Lucene53NormsProducer.java:63) ... 12 more This is usually followed in very short order by similar
[jira] [Created] (SOLR-10674) Don't delete core.properties when a core is unloaded
Simon Rosenthal created SOLR-10674: -- Summary: Don't delete core.properties when a core is unloaded Key: SOLR-10674 URL: https://issues.apache.org/jira/browse/SOLR-10674 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 6.5.1, 6.3 Environment: Centos 7 on AWS, Java 8 Reporter: Simon Rosenthal Priority: Minor In earlier versions of Solr, unloading a core caused it's core.properties file to be renamed to something like core.properties.unloaded. The current behavior (observed in 6.3.0 and 6.5.1, running in non-cloud mode) is to delete core.properties, which is extremely inconvenient if it is anticipated that the core will be reloaded. here's the logfile entry for the unload request INFO - 2017-05-11 09:39:23.960; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/cores params={core=dev0510=UNLOAD=json&_=1494513368797} status=0 QTime=624 Please consider restoring the previous behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8978) Support comment lines in CSV input files
[ https://issues.apache.org/jira/browse/SOLR-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-8978: -- Description: There is what looks like latent support for identifying and skipping coment lines in the CSV Update handler ( CSVStrategy.java and CSVParser.java), but it is not exposed. It would be a useful improvement to enable this functionality via a request parameter (e.g. commentCharacter="#") (was: There is latent support for identifying and skipping coment lines in the CSV Update handler ( CSVStrategy.java and CSVParser.java), but it is not exposed. It would be a useful improvement to enable this functionality via a request parameter (e.g. commentCharacter="#")) > Support comment lines in CSV input files > > > Key: SOLR-8978 > URL: https://issues.apache.org/jira/browse/SOLR-8978 > Project: Solr > Issue Type: Improvement > Components: update >Reporter: Simon Rosenthal >Priority: Minor > > There is what looks like latent support for identifying and skipping coment > lines in the CSV Update handler ( CSVStrategy.java and CSVParser.java), but > it is not exposed. It would be a useful improvement to enable this > functionality via a request parameter (e.g. commentCharacter="#") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8978) Support comment lines in CSV input files
Simon Rosenthal created SOLR-8978: - Summary: Support comment lines in CSV input files Key: SOLR-8978 URL: https://issues.apache.org/jira/browse/SOLR-8978 Project: Solr Issue Type: Improvement Components: update Reporter: Simon Rosenthal Priority: Minor There is latent support for identifying and skipping coment lines in the CSV Update handler ( CSVStrategy.java and CSVParser.java), but it is not exposed. It would be a useful improvement to enable this functionality via a request parameter (e.g. commentCharacter="#") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8740) use docValues by default
[ https://issues.apache.org/jira/browse/SOLR-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184988#comment-15184988 ] Simon Rosenthal commented on SOLR-8740: --- If this is adopted, it needs to be clearly documented that DocValues do not retain ordering in multivalued fields whereas stored fields do. Our use case - picking first and last authors from a multivalued 'authors' String field. > use docValues by default > > > Key: SOLR-8740 > URL: https://issues.apache.org/jira/browse/SOLR-8740 > Project: Solr > Issue Type: Improvement >Affects Versions: master >Reporter: Yonik Seeley > Fix For: master > > > We should consider switching to docValues for most of our non-text fields. > This may be a better default since it is more NRT friendly and acts to avoid > OOM errors due to large field cache or UnInvertedField entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7088) bin/solr script will not start Solr if CDPATH environment variable is set
Simon Rosenthal created SOLR-7088: - Summary: bin/solr script will not start Solr if CDPATH environment variable is set Key: SOLR-7088 URL: https://issues.apache.org/jira/browse/SOLR-7088 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 5.0 Environment: Ubuntu 14.4, BASH 4.3 Reporter: Simon Rosenthal if the BASH environment variable 'CDPATH' is set it causes a change in behavior to the 'cd' command such that it will print out the directory which is the target of the 'cd' command. This breaks the bin/solr script at SOLR_TIP=`cd $SOLR_TIP; pwd` DEFAULT_SERVER_DIR=$SOLR_TIP/server As a result, SOLR_TIP will then contain the directory name twice with disastrous results subsequently. Fix is to add UNSET CDPATH early in the script -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976946#comment-13976946 ] Simon Rosenthal commented on SOLR-5871: --- +1 to add this functionality - at least an enumeration of the fields where there were matches. I could live without having the matching terms. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Assignee: Erick Erickson Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840622#comment-13840622 ] Simon Rosenthal commented on SOLR-4722: --- I also needed to remove the line for the unnecessary import import org.apache.lucene.index.StoredDocument as that class is not in Solr 4.4. After that, it worked like a charm. Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, 5.0 Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822537#comment-13822537 ] Simon Rosenthal commented on SOLR-4722: --- Great patch ! I'd like to use the code as the basis for a component which will simply return term positions for each query term - no need for having highlighting enabled as a prerequisite, or to return term offsets - this is a text mining project where we'll be running queries in batch mode and storing this information externally. Can you think of any gotchas I might encounter ? Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, 5.0 Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822893#comment-13822893 ] Simon Rosenthal commented on SOLR-4722: --- Just one oddity - there are references to the StoredDocument class in getUniqueKeys() which (as far as I can see) is only in trunk - and I'm using Lucene/Solr 4.5.1. I replaced that with Document = which compiles OK, but I haven't had a chance to try it out yet. Do you think it should work ? -Simon On Thursday, November 14, 2013 12:21 PM, Tricia Jenkins (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822636#comment-13822636 ] Tricia Jenkins commented on SOLR-4722: -- Thanks for your interest. This code/jar could be used as is for your purposes. If you don't want to specify highlighting enabled in each query just move it to conf/solrconfig.xml: {code:xml} requestHandler name=standard class=solr.StandardRequestHandler lst name=defaults str name=hltrue/str /lst /requestHandler {code} This highlighter only returns the term positions. The term offsets are stored because they're used by the FastVectorHighlighter. You won't get any useful information from this highlighter if you disable termOffsets in your schema.xml. I just ran this patch against trunk. Still works! -- This message was sent by Atlassian JIRA (v6.1#6144) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, 5.0 Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3535) Add block support for XMLLoader
[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293761#comment-13293761 ] Simon Rosenthal commented on SOLR-3535: --- Mikhail: not clear to me from the code/comments exactly what this issue/patch is meant to accomplish. I'm assuming that the intention is to be able to add atomically every document in the block at once ? That is a use case which I have encountered (a batch update of a set of records with new product price information, where you want to commit them only when the complete set has been indexed, regardless of autocommits being fired off or other processes issuing commits). If that's the intention, this patch is great ! I attempted to address the problem of undesired autocommits in SOLR-2664 - enable/disable autocommit on the fly, but that patch is very out of date. I do think it should be extended to updates in CSV/JSON and updates using the SolrJ API. +1 for Erik's suggestion on the syntax. Add block support for XMLLoader --- Key: SOLR-3535 URL: https://issues.apache.org/jira/browse/SOLR-3535 Project: Solr Issue Type: Sub-task Components: update Affects Versions: 4.1, 5.0 Reporter: Mikhail Khludnev Priority: Minor Attachments: SOLR-3535.patch I'd like to add the following update xml message: add-block doc/doc doc/doc /add-block out of scope for now: * other update formats * update log support (NRT), should not be a big deal * overwrite feature support for block updates - it's more complicated, I'll tell you why Alt * wdyt about adding attribute to the current tag {pre}add block=true{pre} * or we can establish RunBlockUpdateProcessor which treat every add /add as a block. *Test is included!!* How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-2703: -- Attachment: SOLR-2703.patch New patch. Query parser not registered by default, and a commented out entry in example solrconfig was added. Hopefully ready to commit Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098106#comment-13098106 ] Simon Rosenthal commented on SOLR-2703: --- Wiki page to follow at http://wiki.apache.org/solr/SurroundQueryParser Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098133#comment-13098133 ] Simon Rosenthal commented on SOLR-2703: --- Should hold up on the commit until https://issues.apache.org/jira/browse/LUCENE-2945 patch has been committed, otherwise query caching is very broken. I updated the patch for that issue to work with trunk a few weeks ago. Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Assignee: Erik Hatcher Priority: Minor Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091187#comment-13091187 ] Simon Rosenthal commented on SOLR-2731: --- In addition to loading CSV results into a spreadsheet, I often use CSV as a quick-and-dirty way of dumping the contents of an index to be re-read into Solr, and adding lines which would need manual removal would be rather inconvenient. I'd go for option 4, with the comment symbol and result metadata on one line. org.apache.commons.csv has an option (which is not currently enabled in the CSVRequestHandler) to recognize and discard comment lines - adding a request parameter to the handler to recognize comment lines would be straightforward, and would at least solve my use case, though I admit not all others. CSVResponseWriter should optionally return numfound --- Key: SOLR-2731 URL: https://issues.apache.org/jira/browse/SOLR-2731 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 3.1, 3.3, 4.0 Reporter: Jon Hoffman Labels: patch Fix For: 3.1.1, 3.3, 4.0 Attachments: SOLR-2731.patch Original Estimate: 1h Remaining Estimate: 1h an optional parameter csv.numfound=true can be added to the request which causes the first line of the response to be the numfound. This would have no impact on existing behavior, and those who are interested in that value can simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound
[ https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091227#comment-13091227 ] Simon Rosenthal commented on SOLR-2731: --- good point. In that case I'm agnostic - 1) would be fine. CSVResponseWriter should optionally return numfound --- Key: SOLR-2731 URL: https://issues.apache.org/jira/browse/SOLR-2731 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 3.1, 3.3, 4.0 Reporter: Jon Hoffman Labels: patch Fix For: 3.1.1, 3.3, 4.0 Attachments: SOLR-2731.patch Original Estimate: 1h Remaining Estimate: 1h an optional parameter csv.numfound=true can be added to the request which causes the first line of the response to be the numfound. This would have no impact on existing behavior, and those who are interested in that value can simply read off the first line before sending to their usual csv parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-1725: -- Attachment: SOLR-1725-rev1.patch With the hope that this can be committed to trunk soon, I updated the patch to work with reorganized sources in trunk, and a couple of other small changes so that the tests would compile . Some tests fail - I'm seeing [junit] ERROR: SolrIndexSearcher opens=30 closes=28 [junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=30 closes=28 in the ScriptUpdateProcessorFactoryTest Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2664) Disable/enable autocommit on the fly
[ https://issues.apache.org/jira/browse/SOLR-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089690#comment-13089690 ] Simon Rosenthal commented on SOLR-2664: --- In working on extending the functionality to SolrJ/CSV/JSON, I've seen some shortcomings and ambiguities in the initial API I was working on. here's a clearer approach: The anticipated use case would be to disable autocommits temporarily during update processing from a content stream (or from a server.add() in solrJ). So for /update handlers where a content stream is specified, append the parameter deferAutoCommit=true to the URL. Autocommits will be disabled while the content stream is processed, and automatically re-enabled at the end of processing (regardless of success or failure). This will also be a recognized attribute in an add element for solr XML processing. Additionally, no content stream is specified, one can specify the single operations /update?disableAutoCommit=true and subsequently /update/enableAutoCommit=true, in the same way as you can specify commit as the only parameter. However you're on your own if you do the one without the other SolrJ will have new server#disableAutoCommit and server#enableAutocommit methods, and also add(doc, boolean deferAutoCommit). Disable/enable autocommit on the fly Key: SOLR-2664 URL: https://issues.apache.org/jira/browse/SOLR-2664 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Fix For: 4.0 Attachments: SOLR-2664.patch There are occasions when although autocommit is configured, it would be desirable to disable it temporarily - for instance when batch adding/updating a set of documents which should be committed atomically (e.g. a set of price changes). The patch adds disableAutoCommit/ and /enableAutoCommit commands to XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to the add element - this will disable autocommit until the terminating /add at the end of the XML document is reached. At present, the autocommit state will not survive a core reload. It should be possible to extend this functionality to SolrJ, CSVUpdatehandler ( and JSONUpdateHandler ?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2664) Disable/enable autocommit on the fly
[ https://issues.apache.org/jira/browse/SOLR-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089783#comment-13089783 ] Simon Rosenthal commented on SOLR-2664: --- bq. Rather than disabling autoCommit globally (for all clients), shouldn't it just disable it for that particular client? A different client may be adding some time sensitive documents. That's a much better approach! I like the batch parameter. Is it in fact now possible to autocommit (or not, in this case) only for a particular content stream/batch, when multiple ones are being indexed simultaneoulsy ? (my understanding has always been that commits/autocommits were global in effect). Disable/enable autocommit on the fly Key: SOLR-2664 URL: https://issues.apache.org/jira/browse/SOLR-2664 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Fix For: 4.0 Attachments: SOLR-2664.patch There are occasions when although autocommit is configured, it would be desirable to disable it temporarily - for instance when batch adding/updating a set of documents which should be committed atomically (e.g. a set of price changes). The patch adds disableAutoCommit/ and /enableAutoCommit commands to XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to the add element - this will disable autocommit until the terminating /add at the end of the XML document is reached. At present, the autocommit state will not survive a core reload. It should be possible to extend this functionality to SolrJ, CSVUpdatehandler ( and JSONUpdateHandler ?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-2703: -- Attachment: SOLR-2703.patch Updated patch - adds a test suite, support for configuring maxBasicQueries in the URL, and configureing as one of the default parsers Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Attachments: SOLR-2703.patch, SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2709) processing a synonym in a token stream will remove the following token from the stream
processing a synonym in a token stream will remove the following token from the stream -- Key: SOLR-2709 URL: https://issues.apache.org/jira/browse/SOLR-2709 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Environment: solr 4.0 built from trunk today Solr Specification Version: 4.0.0.2011.08.11.16.00.45 Solr Implementation Version: 4.0-SNAPSHOT 1156711M - simon - 2011-08-11 16:00:45 Lucene Specification Version: 4.0-SNAPSHOT Lucene Implementation Version: 4.0-SNAPSHOT 1156556 - simon - 2011-08-11 09:33:46 Current Time: Thu Aug 11 21:27:11 EDT 2011 Server Start Time:Thu Aug 11 20:59:11 EDT 2011 Reporter: Simon Rosenthal If you do a phrase search on a field derived from a fieldtype with the synonym filter which includes a synonym, the term following the synonym vanishes after synonym expansion. e.g. http://host:port/solr/corename/select/?q=desc:%22xyzzy%20%20bbb%20pot%20of%20gold%22version=2.2start=0rows=10indent=ondebugQuery=true (bbb is in the default synonyms file, desc is a text fieldtype) outputs str name=rawquerystringdesc:xyzzy bbb pot of gold/str str name=querystringdesc:xyzzy bbb pot of gold/str str name=parsedqueryPhraseQuery(desc:xyzzy 1 2 of gold)/str str name=parsedquery_toStringdesc:xyzzy 1 2 of gold/str You can also see this behavior using the admin console analysis.jsp Solr 3.3 behaves properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082569#comment-13082569 ] Simon Rosenthal commented on SOLR-2703: --- +1 on eventually adding analysis support to the parser. The default (1024) for maxBasicQueries seems more than adequate but it wouldn't hurt to have it as a parameter. I found that a field simply analyzed with lowercasing and the EnglishPluralStemmer gave decent results, and wildcard searches using the base form of the term will mostly compensate for lack of stemming support - all this can be documented in the javadocs and the Wiki. Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Attachments: SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081893#comment-13081893 ] Simon Rosenthal commented on LUCENE-2945: - Paul - can you refactor the 2945d patch so that it will apply cleanly to the reorganized source tree, as the surround parser is now under modules/ ? Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, LUCENE-2945d.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated LUCENE-2945: Attachment: LUCENE-2945e.patch LUCENE-29453.patch uploaded. modified from 2945d as you suggested, and it applied cleanly to 08/08 nightly build Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, LUCENE-2945d.patch, LUCENE-2945e.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated LUCENE-2945: Attachment: LUCENE-2945e.patch revised patch -- needed changes package statements for few files. Applies and compiles cleanly now. Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, LUCENE-2945d.patch, LUCENE-2945e.patch, LUCENE-2945e.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2703) Add support for the Lucene Surround Parser
Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-2703: -- Attachment: SOLR-2703.patch initial patch. No tests yet. Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Attachments: SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081971#comment-13081971 ] Simon Rosenthal edited comment on SOLR-2703 at 8/9/11 10:28 PM: The most recent LUCENE-2945 patch is needed so that surround queries can be properly cached was (Author: simon.rosenthal): The most recent patch is needed so that surround queries can be properly cached Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Attachments: SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2667) Finish Solr Admin UI
[ https://issues.apache.org/jira/browse/SOLR-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071897#comment-13071897 ] Simon Rosenthal commented on SOLR-2667: --- The status displayed for DIH indexing is not as detailed as that on the old page- I prefer the elapsed time in more precision, rather than 'n minutes ago' Since you're doing a status request every few seconds, would it be possible to add metrics such as 'documents processed per second' ?' (either for the last few seconds, or since the start of the import, or both) Finish Solr Admin UI Key: SOLR-2667 URL: https://issues.apache.org/jira/browse/SOLR-2667 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 4.0 Attachments: SOLR-2667-110722.patch In SOLR-2399, we added a new admin UI. The issue has gotten too long to follow, so this is a new issue to track remaining tasks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1032) CSV loader to support literal field values
[ https://issues.apache.org/jira/browse/SOLR-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-1032: -- Attachment: SOLR-1032.patch here's a first cut at a patch. The syntax for the literal field is f.fieldname.literal=literalvalue The supplied literal value is not processed in any way (e.g. split into multi values, quotes removed) No tests yet. CSV loader to support literal field values -- Key: SOLR-1032 URL: https://issues.apache.org/jira/browse/SOLR-1032 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Erik Hatcher Priority: Minor Attachments: SOLR-1032.patch It would be very handy if the CSV loader could handle a literal field mapping, like the extracting request handler does. For example, in a scenario where you have multiple datasources (some data from a DB, some from file crawls, and some from CSV) it is nice to add a field to every document that specifies the data source. This is easily done with DIH with a template transformer, and Solr Cell with ext.literal.datasource=, but impossible currently with CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1032) CSV loader to support literal field values
[ https://issues.apache.org/jira/browse/SOLR-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070939#comment-13070939 ] Simon Rosenthal commented on SOLR-1032: --- Patch is for 4.0 CSV loader to support literal field values -- Key: SOLR-1032 URL: https://issues.apache.org/jira/browse/SOLR-1032 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Erik Hatcher Priority: Minor Attachments: SOLR-1032.patch It would be very handy if the CSV loader could handle a literal field mapping, like the extracting request handler does. For example, in a scenario where you have multiple datasources (some data from a DB, some from file crawls, and some from CSV) it is nice to add a field to every document that specifies the data source. This is easily done with DIH with a template transformer, and Solr Cell with ext.literal.datasource=, but impossible currently with CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2664) Disable/enable autocommit on the fly
Disable/enable autocommit on the fly Key: SOLR-2664 URL: https://issues.apache.org/jira/browse/SOLR-2664 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Fix For: 4.0 There are occasions when although autocommit is configured, it would be desirable to disable it temporarily - for instance when batch adding/updating a set of documents which should be committed atomically (e.g. a set of price changes). The patch adds disableAutoCommit/ and /enableAutoCommit commands to XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to the add element - this will disable autocommit until the terminating /add at the end of the XML document is reached. At present, the autocommit state will not survive a core reload. It should be possible to extend this functionality to SolrJ, CSVUpdatehandler ( and JSONUpdateHandler ?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2664) Disable/enable autocommit on the fly
[ https://issues.apache.org/jira/browse/SOLR-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-2664: -- Attachment: SOLR-2664.patch Patch added Disable/enable autocommit on the fly Key: SOLR-2664 URL: https://issues.apache.org/jira/browse/SOLR-2664 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0 Reporter: Simon Rosenthal Priority: Minor Fix For: 4.0 Attachments: SOLR-2664.patch There are occasions when although autocommit is configured, it would be desirable to disable it temporarily - for instance when batch adding/updating a set of documents which should be committed atomically (e.g. a set of price changes). The patch adds disableAutoCommit/ and /enableAutoCommit commands to XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to the add element - this will disable autocommit until the terminating /add at the end of the XML document is reached. At present, the autocommit state will not survive a core reload. It should be possible to extend this functionality to SolrJ, CSVUpdatehandler ( and JSONUpdateHandler ?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2580) Create a new Search Component to alter queries based on business rules.
[ https://issues.apache.org/jira/browse/SOLR-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046222#comment-13046222 ] Simon Rosenthal commented on SOLR-2580: --- Tomas:l I'm not sure why you would want to encapsulate these kind of rules within Solr - an e-commerce site would always have an application layer between the UI and Solr which seems like the logical place to apply business rules leading to modifying the request by adding boosts, specifying sort order, etc. Also, is Drools separate from JBoss (which is used relatively in frequently in the Solr community) ? Create a new Search Component to alter queries based on business rules. Key: SOLR-2580 URL: https://issues.apache.org/jira/browse/SOLR-2580 Project: Solr Issue Type: New Feature Reporter: Tomás Fernández Löbbe The goal is to be able to adjust the relevance of documents based on user defined business rules. For example, in a e-commerce site, when the user chooses the shoes category, we may be interested in boosting products from a certain brand. This can be expressed as a rule in the following way: rule Boost Adidas products when searching shoes when $qt : QueryTool() TermQuery(term.field==category, term.text==shoes) then $qt.boost({!lucene}brand:adidas); end The QueryTool object should be used to alter the main query in a easy way. Even more human-like rules can be written: rule Boost Adidas products when searching shoes when Query has term shoes in field product then Add boost query {!lucene}brand:adidas end These rules are written in a text file in the config directory and can be modified at runtime. Rules will be managed using JBoss Drools: http://www.jboss.org/drools/drools-expert.html On a first stage, it will allow to add boost queries or change sorting fields based on the user query, but it could be extended to allow more options. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983215#action_12983215 ] Simon Rosenthal commented on SOLR-445: -- bq. Don't allow autocommits during an update. Simple. Or, rather, all update requests block at the beginning during an autocommit. If an update request has too many documents, don't do so many documents in an update. (Lance) Lance - How do you (dynamically ) disable autocommits during a specific update ? That functionality would also be useful in other use cases, but that's another issue). bq. NOTE: This does change the behavior of Solr. Without this patch, the first document that is incorrect stops processing. Now, it continues merrily on adding documents as it can. Is this desirable behavior? It would be easy to abort on first error if that's the consensus, and I could take some tedious record-keeping out. I think there's no big problem with continuing on, since the state of committed documents is indeterminate already when errors occur so worrying about this should be part of a bigger issue. I think it should be an option, if possible. I can see use cases where abort-on-first-error is desirable, but also situations where you know one or two documents may be erroneous, and its worth continuing on in order to index the other 99% XmlUpdateRequestHandler bad documents mid batch aborts rest of batch Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Erick Erickson Fix For: Next Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption
[ https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880490#action_12880490 ] Simon Rosenthal commented on SOLR-1911: --- No - seems to have cleared up with trunk also,. I'm OK with closing it but am really curious to know what changed between mid May and today to clear up the problem. File descriptor leak while indexing, may cause index corruption --- Key: SOLR-1911 URL: https://issues.apache.org/jira/browse/SOLR-1911 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.5 Environment: Ubuntu Linux, Java build 1.6.0_16-b01 Solr Specification Version: 3.0.0.2010.05.12.16.17.46 Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 16:17:46 -- bult from updated trunk Lucene Specification Version: 4.0-dev Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26 Current Time: Thu May 13 12:21:12 EDT 2010 Server Start Time:Thu May 13 11:45:41 EDT 2010 Reporter: Simon Rosenthal Priority: Critical Attachments: indexlsof.tar.gz, openafteropt.txt While adding documents to an already existing index using this build, the number of open file descriptors increases dramatically until the open file per-process limit is reached (1024) , at which point there are error messages in the log to that effect. If the server is restarted the index may be corrupt commits are handled by autocommit every 60 seconds or 500 documents (usually the time limit is reached first). mergeFactor is 10. It looks as though each time a commit takes place, the number of open files (obtained from lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 40, There are several open file descriptors associated with each file in the index. Rerunning the same index updates with an older Solr (built from trunk in Feb 2010) doesn't show this problem - the number of open files fluctuates up and down as segments are created and merged, but stays basically constant. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1911) File descriptor leak while indexing, may cause index corruption
[ https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-1911: -- Attachment: indexlsof.tar.gz OK.. I built from latest trunk, used the schema associated with the index and example solrconfig.xml, as you asked. - Started with a snapshot of the index taken before this issue reared its head - used post.sh to add a file with around 800 documents (different one each time) - did a commit (no autocommit) - did an lsof on the process repeated the add/commit/lsof 5 times. The attached tarball contains the lsof outputs, and we're still seeing the number of fds incrementing by 38-40 after each commit. I didn't go to the bitter end, but I assume we'd get there... Here's a clue -?? I looked for file descriptors associated with one .prx file that was present in the original snapshot in each lsof output grep -c _r8.prx lsof.* lsof.0:1 lsof.1:2 lsof.2:3 lsof.3:4 lsof.4:5 lsof.5:6 The .frq files seem to have the same pattern. I'm assuming that's not good... File descriptor leak while indexing, may cause index corruption --- Key: SOLR-1911 URL: https://issues.apache.org/jira/browse/SOLR-1911 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.5 Environment: Ubuntu Linux, Java build 1.6.0_16-b01 Solr Specification Version: 3.0.0.2010.05.12.16.17.46 Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 16:17:46 -- bult from updated trunk Lucene Specification Version: 4.0-dev Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26 Current Time: Thu May 13 12:21:12 EDT 2010 Server Start Time:Thu May 13 11:45:41 EDT 2010 Reporter: Simon Rosenthal Priority: Critical Attachments: indexlsof.tar.gz, openafteropt.txt While adding documents to an already existing index using this build, the number of open file descriptors increases dramatically until the open file per-process limit is reached (1024) , at which point there are error messages in the log to that effect. If the server is restarted the index may be corrupt commits are handled by autocommit every 60 seconds or 500 documents (usually the time limit is reached first). mergeFactor is 10. It looks as though each time a commit takes place, the number of open files (obtained from lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 40, There are several open file descriptors associated with each file in the index. Rerunning the same index updates with an older Solr (built from trunk in Feb 2010) doesn't show this problem - the number of open files fluctuates up and down as segments are created and merged, but stays basically constant. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption
[ https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867880#action_12867880 ] Simon Rosenthal commented on SOLR-1911: --- bq: 1) what mechanism are you using to index content? ie: POSTing XML from a remote client? using the stream.url or stream.file params? Using SolrCell? using DIH? (and if you are using DIH, from what source? DB? HTTP? File? .. and with what transformers?) posting XML from a local client, not using stream.url or stream.file 2) what files does lsof show are open after each successive commit until the limit is reached? seeing how the file list grows - specifically which files are never getting closed - over time is really the only way to track down what code isn't closing files will attach lsof output taken after it reached the limit File descriptor leak while indexing, may cause index corruption --- Key: SOLR-1911 URL: https://issues.apache.org/jira/browse/SOLR-1911 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.5 Environment: Ubuntu Linux, Java build 1.6.0_16-b01 Solr Specification Version: 3.0.0.2010.05.12.16.17.46 Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 16:17:46 -- bult from updated trunk Lucene Specification Version: 4.0-dev Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26 Current Time: Thu May 13 12:21:12 EDT 2010 Server Start Time:Thu May 13 11:45:41 EDT 2010 Reporter: Simon Rosenthal Priority: Critical While adding documents to an already existing index using this build, the number of open file descriptors increases dramatically until the open file per-process limit is reached (1024) , at which point there are error messages in the log to that effect. If the server is restarted the index may be corrupt commits are handled by autocommit every 60 seconds or 500 documents (usually the time limit is reached first). mergeFactor is 10. It looks as though each time a commit takes place, the number of open files (obtained from lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 40, There are several open file descriptors associated with each file in the index. Rerunning the same index updates with an older Solr (built from trunk in Feb 2010) doesn't show this problem - the number of open files fluctuates up and down as segments are created and merged, but stays basically constant. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1911) File descriptor leak while indexing, may cause index corruption
[ https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Rosenthal updated SOLR-1911: -- Attachment: openafteropt.txt lsof output after the error occurred File descriptor leak while indexing, may cause index corruption --- Key: SOLR-1911 URL: https://issues.apache.org/jira/browse/SOLR-1911 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.5 Environment: Ubuntu Linux, Java build 1.6.0_16-b01 Solr Specification Version: 3.0.0.2010.05.12.16.17.46 Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 16:17:46 -- bult from updated trunk Lucene Specification Version: 4.0-dev Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26 Current Time: Thu May 13 12:21:12 EDT 2010 Server Start Time:Thu May 13 11:45:41 EDT 2010 Reporter: Simon Rosenthal Priority: Critical Attachments: openafteropt.txt While adding documents to an already existing index using this build, the number of open file descriptors increases dramatically until the open file per-process limit is reached (1024) , at which point there are error messages in the log to that effect. If the server is restarted the index may be corrupt commits are handled by autocommit every 60 seconds or 500 documents (usually the time limit is reached first). mergeFactor is 10. It looks as though each time a commit takes place, the number of open files (obtained from lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 40, There are several open file descriptors associated with each file in the index. Rerunning the same index updates with an older Solr (built from trunk in Feb 2010) doesn't show this problem - the number of open files fluctuates up and down as segments are created and merged, but stays basically constant. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-1911) File descriptor leak while indexing, may cause index corruption
File descriptor leak while indexing, may cause index corruption --- Key: SOLR-1911 URL: https://issues.apache.org/jira/browse/SOLR-1911 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.5 Environment: Ubuntu Linux, Java build 1.6.0_16-b01 Solr Specification Version: 3.0.0.2010.05.12.16.17.46 Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 16:17:46 -- bult from updated trunk Lucene Specification Version: 4.0-dev Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26 Current Time: Thu May 13 12:21:12 EDT 2010 Server Start Time:Thu May 13 11:45:41 EDT 2010 Reporter: Simon Rosenthal Priority: Critical While adding documents to an already existing index using this build, the number of open file descriptors increases dramatically until the open file per-process limit is reached (1024) , at which point there are error messages in the log to that effect. If the server is restarted the index may be corrupt the solr log reports: May 13, 2010 12:37:04 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) May 13, 2010 12:37:04 PM org.apache.solr.update.DirectUpdateHandler2$CommitTracker run SEVERE: auto commit error... java.io.FileNotFoundException: /home/simon/rig2/solr/core1/data/index/_j2.nrm (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:69) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:90) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:80) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67) at org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java:1093) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:532) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:634) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:610) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1012) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4563) at org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3775) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3623) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3614) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1769) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1732) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1696) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409) at org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) May 13, 2010 12:37:04 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 1 May 13, 2010 12:37:04 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: directory '/home/simon/rig2/solr/core1/data/index' exists and is a directory, but cannot be listed: list() returned null at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:223) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:234) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:582) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:316) at
[jira] Commented: (SOLR-1750) SystemStatsRequestHandler - replacement for stats.jsp
[ https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829354#action_12829354 ] Simon Rosenthal commented on SOLR-1750: --- +1 on SolrStatsRequestHandler You might want to consider either omitting or making optional the Lucene Fieldcache stats; they can often be *very* slow to be generated ( see http://www.lucidimagination.com/search/document/5ba908577d2e4c25/stats_page_slow_in_latest_nightly#2f40166c25f9bfa0 ). One use case for this request handler that I can see is high frequency (every few seconds) monitoring as part of performance testing, for which a fast response is pretty mandatory. SystemStatsRequestHandler - replacement for stats.jsp - Key: SOLR-1750 URL: https://issues.apache.org/jira/browse/SOLR-1750 Project: Solr Issue Type: Improvement Components: web gui Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Trivial Fix For: 1.5 Attachments: SystemStatsRequestHandler.java stats.jsp is cool and all, but suffers from escaping issues, and also is not accessible from SolrJ or other standard Solr APIs. Here's a request handler that emits everything stats.jsp does. For now, it needs to be registered in solrconfig.xml like this: {code} requestHandler name=/admin/stats class=solr.SystemStatsRequestHandler / {code} But will register this in AdminHandlers automatically before committing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1603) Perl Response Writer
[ https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806586#action_12806586 ] Simon Rosenthal commented on SOLR-1603: --- The patch installed fine. +1 for committing it. the output is a complex Perl data structure with search results which would presumably immediately be assigned to a variable - not eval'd. Absolutely agree with Erik and Yonik - I can't think of a realistic case in which this would present a security risk. Perl Response Writer Key: SOLR-1603 URL: https://issues.apache.org/jira/browse/SOLR-1603 Project: Solr Issue Type: New Feature Components: Response Writers Reporter: Claudio Valente Priority: Minor Attachments: SOLR-1603.2.patch, SOLR-1603.patch I've made a patch that implements a Perl response writer for Solr. It's nan/inf and unicode aware. I don't know whether some fields can be binary but if so I can probably extend it to support that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1603) Perl Response Writer
[ https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806097#action_12806097 ] Simon Rosenthal commented on SOLR-1603: --- Couldn't build Solr with the current patch (against trunk). probably because the package for ResponseWriters was changed recently to org.apache.solr.response; Perl Response Writer Key: SOLR-1603 URL: https://issues.apache.org/jira/browse/SOLR-1603 Project: Solr Issue Type: New Feature Components: Response Writers Reporter: Claudio Valente Priority: Minor Attachments: SOLR-1603.patch I've made a patch that implements a Perl response writer for Solr. It's nan/inf and unicode aware. I don't know whether some fields can be binary but if so I can probably extend it to support that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1603) Perl Response Writer
[ https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782605#action_12782605 ] Simon Rosenthal commented on SOLR-1603: --- I'd be curious to know what the use case is for this - I've used the both JSON and XMLWriters to write results for a Perl web application with few problems. Perl Response Writer Key: SOLR-1603 URL: https://issues.apache.org/jira/browse/SOLR-1603 Project: Solr Issue Type: New Feature Components: Response Writers Reporter: Claudio Valente Attachments: SOLR-1603.patch I've made a patch that implements a Perl response writer for Solr. It's nan/inf and unicode aware. I don't know whether some fields can be binary but if so I can probably extend it to support that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1509) Admin UI display of schema.xml can't find schema file
Admin UI display of schema.xml can't find schema file - Key: SOLR-1509 URL: https://issues.apache.org/jira/browse/SOLR-1509 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Simon Rosenthal Priority: Minor This is in a multicore enviroment: solr.xml contains solr sharedLib=lib shareSchema=true persistent=true cores adminPath=/admin/cores core default=true instanceDir=core1 name=core1 config=/home/solrdata/rig1/conf/solrconfig.xml schema=/home/solrdata/rig1/conf/schema.xml property name=dataDir value=/home/solrdata/rig1/core1 / /core core default=false instanceDir=core2 name=core2 ... schema same as above /core ... /cores When I go to the URL /solr/core1/admin/ and click on the schema link, the URL displayed by the browser is http://host:port/solr/core1/admin/file/?file=/home/solrdata/rig1/conf/schema.xml which looks correct, but a HTTP 400 error is displayed in of the form Can not find: schema.xml [/path/to/core1/conf/directory/home/solrdata/rig1/conf/schema.xml] looks as though it's blindly appending the schema.xml path to the conf directory, even though it's not a relative one. Same for other cores The schema browser link on the admin page works fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.