[jira] [Resolved] (SOLR-12327) 'bin/solr status' should be able to produce output as pure JSON

2018-05-07 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal resolved SOLR-12327.

Resolution: Not A Problem

Didn't realize that 

*curl 'http://localhost:8983/solr/admin/info/system'*

will return  pure JON with that information

> 'bin/solr status' should be able to produce output as pure JSON
> ---
>
> Key: SOLR-12327
> URL: https://issues.apache.org/jira/browse/SOLR-12327
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: 7.2
>Reporter: Simon Rosenthal
>Priority: Minor
>
> The 'bin/solr status' command should optionally (so as not to break back 
> compat) produce its outputs in pure JSON, for easier parsing, rather than a 
> mixture of free text and JSON as it does at present.
> e.g.
> {{prompt# bin/solr status purejson}}
> {{ {}}
> {{# these two lines replace "Solr process  on port "}}
> {{ *"solr_port":"8983",*}}
> {{ *"solr_pid":"14020",*}}
> {{ "solr_home":"/home/user/solr-7.2.1/server/solr",}}
> {{ "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 
> 2018-01-10 00:54:21",}}
> {{ "startTime":"2018-05-02T13:51:43.388Z",}}
> {{ "uptime":"5 days, 2 hours, 39 minutes, 35 seconds",}}
> {{ "memory":"1.2 GB (%25.7) of 4.8 GB",}}
> {{ "cloud":{}}
> {{ "ZooKeeper":"localhost:9983",}}
> {{ "liveNodes":"1",}}
> {{ "collections":"1"
> The use case here is mapping a solr port (where that is the only available 
> information about the Solr instance) to ZK host/port(s) for a subsequent call 
> to zkcli.sh.
>  
> {{ }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12327) 'bin/solr status' should be able to produce output as pure JSON

2018-05-07 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466234#comment-16466234
 ] 

Simon Rosenthal edited comment on SOLR-12327 at 5/7/18 5:39 PM:


Didn't realize that 

*curl 'http://localhost:8983/solr/admin/info/system'*

will return  pure JSON with that information


was (Author: simon.rosenthal):
Didn't realize that 

*curl 'http://localhost:8983/solr/admin/info/system'*

will return  pure JON with that information

> 'bin/solr status' should be able to produce output as pure JSON
> ---
>
> Key: SOLR-12327
> URL: https://issues.apache.org/jira/browse/SOLR-12327
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: 7.2
>Reporter: Simon Rosenthal
>Priority: Minor
>
> The 'bin/solr status' command should optionally (so as not to break back 
> compat) produce its outputs in pure JSON, for easier parsing, rather than a 
> mixture of free text and JSON as it does at present.
> e.g.
> {{prompt# bin/solr status purejson}}
> {{ {}}
> {{# these two lines replace "Solr process  on port "}}
> {{ *"solr_port":"8983",*}}
> {{ *"solr_pid":"14020",*}}
> {{ "solr_home":"/home/user/solr-7.2.1/server/solr",}}
> {{ "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 
> 2018-01-10 00:54:21",}}
> {{ "startTime":"2018-05-02T13:51:43.388Z",}}
> {{ "uptime":"5 days, 2 hours, 39 minutes, 35 seconds",}}
> {{ "memory":"1.2 GB (%25.7) of 4.8 GB",}}
> {{ "cloud":{}}
> {{ "ZooKeeper":"localhost:9983",}}
> {{ "liveNodes":"1",}}
> {{ "collections":"1"
> The use case here is mapping a solr port (where that is the only available 
> information about the Solr instance) to ZK host/port(s) for a subsequent call 
> to zkcli.sh.
>  
> {{ }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12327) 'bin/solr status' should be able to produce output as pure JSON

2018-05-07 Thread Simon Rosenthal (JIRA)
Simon Rosenthal created SOLR-12327:
--

 Summary: 'bin/solr status' should be able to produce output as 
pure JSON
 Key: SOLR-12327
 URL: https://issues.apache.org/jira/browse/SOLR-12327
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: scripts and tools
Affects Versions: 7.2
Reporter: Simon Rosenthal


The 'bin/solr status' command should optionally (so as not to break back 
compat) produce its outputs in pure JSON, for easier parsing, rather than a 
mixture of free text and JSON as it does at present.

e.g.

{{prompt# bin/solr status purejson}}
{{ {}}

{{# these two lines replace "Solr process  on port "}}
{{ *"solr_port":"8983",*}}
{{ *"solr_pid":"14020",*}}
{{ "solr_home":"/home/user/solr-7.2.1/server/solr",}}
{{ "version":"7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 
2018-01-10 00:54:21",}}
{{ "startTime":"2018-05-02T13:51:43.388Z",}}
{{ "uptime":"5 days, 2 hours, 39 minutes, 35 seconds",}}
{{ "memory":"1.2 GB (%25.7) of 4.8 GB",}}
{{ "cloud":{}}
{{ "ZooKeeper":"localhost:9983",}}
{{ "liveNodes":"1",}}
{{ "collections":"1"

The use case here is mapping a solr port (where that is the only available 
information about the Solr instance) to ZK host/port(s) for a subsequent call 
to zkcli.sh.

 


{{ }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8767) RealTimeGetComponent and stored/copyField exclusion

2018-05-02 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461800#comment-16461800
 ] 

Simon Rosenthal commented on SOLR-8767:
---

+1 on changing the behavior. Was just bitten by this real-time get behavior in 
a situation where I was using a copyfield to effectively rename a -->b (a was 
then defined as non-indexed, non-stored). 

Not the intended use of copyfield, I know, and I could probably use a 
FieldNameMutatingUpdateProcessor instead (though this results in including 
field name manipulations in solrconfig.xml, where it really doesn't belong, 
rather than in the schema).

> RealTimeGetComponent and stored/copyField exclusion
> ---
>
> Key: SOLR-8767
> URL: https://issues.apache.org/jira/browse/SOLR-8767
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Erik Hatcher
>Priority: Critical
>
> Consider this scenario: schema has fields `a` and `b` defined, both stored.  
> A copyField is defined from `a` => `b`.  A document is indexed with `id=1; 
> b="foo"`.  A real-time /get will not return field `b` because 
> RealTimeGetComponent.toSolrDoc currently excludes copyField destinations 
> (despite, in this situation, the source of that copyField not being sent in).
> Granted this is a bit of a diabolical case (discovered while tinkering with 
> cloud MLT tests), but isn't that far fetched to happen in the wild.
> Maybe real-time /get should return all fields set as stored, regardless of 
> copyField status?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10840) Random Index Corruption during bulk indexing

2017-11-09 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal resolved SOLR-10840.

Resolution: Cannot Reproduce

After moving our production Solr server to a new AWS instance, the problem 
disappeared. Heaven knows why.

> Random Index Corruption during bulk indexing
> 
>
> Key: SOLR-10840
> URL: https://issues.apache.org/jira/browse/SOLR-10840
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Affects Versions: 6.3, 6.5.1
> Environment: AWS EC2 instance running Centos 7
>Reporter: Simon Rosenthal
>
> I'm seeing a randomly occuring Index Corruption exception during a Solr  data 
> ingest. This can occur anywhere during the 7-8 hours our ingests take. I'm 
> initially submitting this as a Solr bug as this is the envioronment I'm 
> using, but it does look as though the error is occurring in Lucene code.
> Some background:
> AWS EC2 server running CentOS 7
> java.​runtime.​version: 1.8.0_131-b11  (also occurred with 1.8.0_45).
> Solr 6.3.0 (have also seen it with Solr 6.5.1). It did not happen with 
> Solr 5.4 9which i can't go back to). Oddly enough, I ran Solr 6.3.0 
> unvenetfully for several weeks before this problem first occurred.
> Standalone  (non cloud) environment.
> Our indexing subsystem is a complex Python script which creates multiple 
> indexing subprocesses in order to make use of multiple cores. Each subprocess 
> reads records from a MySQL database, does  some significant preprocessing and 
> sends a batch of documents (defaults to 500) to the Solr update handler 
> (using the Python 'scorched' module). Each content source (there are 5-6) 
> requires a separate instantiation of the script, and these wrapped in a Bash 
> script to run serially.
> 
> When the exception occurs, we always see something like the following in 
> the solr.log
> 
> ERROR - 2017-06-06 14:37:34.639; [   x:stresstest1] 
> org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: 
> Exception writing document id med-27840-00384802 to the index; possible 
> analysis error.
> at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178
> ...
> Caused by: org.apache.lucene.store.AlreadyClosedException: this 
> IndexWriter is closed
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740)
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754)
> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558)
> at 
> org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:279)
> at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
> at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
> ... 42 more
> Caused by: java.io.EOFException: read past EOF: 
> MMapIndexInput(path="/indexes/solrindexes/stresstest1/index/_441.nvm")
> at 
> org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75)
> at 
> org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
> at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
> at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
> at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
> at 
> org.apache.lucene.codecs.lucene53.Lucene53NormsProducer.(Lucene53NormsProducer.java:58)
> at 
> org.apache.lucene.codecs.lucene53.Lucene53NormsFormat.normsProducer(Lucene53NormsFormat.java:82)
> at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:113)
> at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
> at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
> at 
> org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384)
> at 
> org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416)
> at 
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261)
> at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4068)
> at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4026)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3880)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
> Suppressed: org.apache.lucene.index.CorruptIndexException: checksum 
> 

[jira] [Created] (SOLR-10840) Random Index Corruption during bulk indexing

2017-06-07 Thread Simon Rosenthal (JIRA)
Simon Rosenthal created SOLR-10840:
--

 Summary: Random Index Corruption during bulk indexing
 Key: SOLR-10840
 URL: https://issues.apache.org/jira/browse/SOLR-10840
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: update
Affects Versions: 6.5.1, 6.3
 Environment: AWS EC2 instance running Centos 7
Reporter: Simon Rosenthal



I'm seeing a randomly occuring Index Corruption exception during a Solr  data 
ingest. This can occur anywhere during the 7-8 hours our ingests take. I'm 
initially submitting this as a Solr bug as this is the envioronment I'm using, 
but it does look as though the error is occurring in Lucene code.

Some background:

AWS EC2 server running CentOS 7
java.​runtime.​version: 1.8.0_131-b11  (also occurred with 1.8.0_45).
Solr 6.3.0 (have also seen it with Solr 6.5.1). It did not happen with Solr 
5.4 9which i can't go back to). Oddly enough, I ran Solr 6.3.0 unvenetfully for 
several weeks before this problem first occurred.
Standalone  (non cloud) environment.

Our indexing subsystem is a complex Python script which creates multiple 
indexing subprocesses in order to make use of multiple cores. Each subprocess 
reads records from a MySQL database, does  some significant preprocessing and 
sends a batch of documents (defaults to 500) to the Solr update handler (using 
the Python 'scorched' module). Each content source (there are 5-6) requires a 
separate instantiation of the script, and these wrapped in a Bash script to run 
serially.

When the exception occurs, we always see something like the following in 
the solr.log

ERROR - 2017-06-06 14:37:34.639; [   x:stresstest1] 
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: 
Exception writing document id med-27840-00384802 to the index; possible 
analysis error.
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178
...
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter 
is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:740)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:754)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1558)
at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:279)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
... 42 more
Caused by: java.io.EOFException: read past EOF: 
MMapIndexInput(path="/indexes/solrindexes/stresstest1/index/_441.nvm")
at 
org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75)
at 
org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
at 
org.apache.lucene.codecs.lucene53.Lucene53NormsProducer.(Lucene53NormsProducer.java:58)
at 
org.apache.lucene.codecs.lucene53.Lucene53NormsFormat.normsProducer(Lucene53NormsFormat.java:82)
at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:113)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at 
org.apache.lucene.index.BufferedUpdatesStream$SegmentState.(BufferedUpdatesStream.java:384)
at 
org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:416)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:261)
at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4068)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4026)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3880)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status 
indeterminate: remaining=0, please run checkindex for more details 
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/indexes/solrindexes/stresstest1/index/_441.nvm")))
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:451)
at 
org.apache.lucene.codecs.lucene53.Lucene53NormsProducer.(Lucene53NormsProducer.java:63)
... 12 more
 
This is usually followed in very short order by similar 

[jira] [Created] (SOLR-10674) Don't delete core.properties when a core is unloaded

2017-05-11 Thread Simon Rosenthal (JIRA)
Simon Rosenthal created SOLR-10674:
--

 Summary: Don't delete core.properties when a core is unloaded
 Key: SOLR-10674
 URL: https://issues.apache.org/jira/browse/SOLR-10674
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 6.5.1, 6.3
 Environment: Centos 7 on AWS, Java 8
Reporter: Simon Rosenthal
Priority: Minor



In earlier versions of Solr, unloading a core caused it's core.properties file 
to be renamed to something like core.properties.unloaded. The current behavior 
(observed in 6.3.0 and 6.5.1, running in non-cloud mode) is to delete 
core.properties, which is extremely inconvenient if it is anticipated that the 
core will be reloaded.

here's the logfile entry for the unload request
INFO  - 2017-05-11 09:39:23.960; [   ] org.apache.solr.servlet.HttpSolrCall; 
[admin] webapp=null path=/admin/cores 
params={core=dev0510=UNLOAD=json&_=1494513368797} status=0 QTime=624

Please consider restoring the  previous behavior.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8978) Support comment lines in CSV input files

2016-04-13 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-8978:
--
Description: There is what looks like latent support for identifying and 
skipping coment lines in the CSV Update handler ( CSVStrategy.java and 
CSVParser.java), but it is not exposed. It would be a useful improvement to 
enable this functionality via a request parameter (e.g. commentCharacter="#")  
(was: There is latent support for identifying and skipping coment lines in the 
CSV Update handler ( CSVStrategy.java and CSVParser.java), but it is not 
exposed. It would be a useful improvement to enable this functionality via a 
request parameter (e.g. commentCharacter="#"))

> Support comment lines in CSV input files
> 
>
> Key: SOLR-8978
> URL: https://issues.apache.org/jira/browse/SOLR-8978
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Simon Rosenthal
>Priority: Minor
>
> There is what looks like latent support for identifying and skipping coment 
> lines in the CSV Update handler ( CSVStrategy.java and CSVParser.java), but 
> it is not exposed. It would be a useful improvement to enable this 
> functionality via a request parameter (e.g. commentCharacter="#")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8978) Support comment lines in CSV input files

2016-04-13 Thread Simon Rosenthal (JIRA)
Simon Rosenthal created SOLR-8978:
-

 Summary: Support comment lines in CSV input files
 Key: SOLR-8978
 URL: https://issues.apache.org/jira/browse/SOLR-8978
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Simon Rosenthal
Priority: Minor


There is latent support for identifying and skipping coment lines in the CSV 
Update handler ( CSVStrategy.java and CSVParser.java), but it is not exposed. 
It would be a useful improvement to enable this functionality via a request 
parameter (e.g. commentCharacter="#")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8740) use docValues by default

2016-03-08 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184988#comment-15184988
 ] 

Simon Rosenthal commented on SOLR-8740:
---

If this is adopted, it  needs to be clearly documented that DocValues do not 
retain ordering in multivalued fields whereas stored fields do. Our use case -  
picking  first and last authors from a multivalued 'authors' String field.


> use docValues by default
> 
>
> Key: SOLR-8740
> URL: https://issues.apache.org/jira/browse/SOLR-8740
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master
>Reporter: Yonik Seeley
> Fix For: master
>
>
> We should consider switching to docValues for most of our non-text fields.  
> This may be a better default since it is more NRT friendly and acts to avoid 
> OOM errors due to large field cache or UnInvertedField entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7088) bin/solr script will not start Solr if CDPATH environment variable is set

2015-02-06 Thread Simon Rosenthal (JIRA)
Simon Rosenthal created SOLR-7088:
-

 Summary: bin/solr script will not start Solr if CDPATH environment 
variable is set
 Key: SOLR-7088
 URL: https://issues.apache.org/jira/browse/SOLR-7088
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 5.0
 Environment: Ubuntu 14.4, BASH 4.3
Reporter: Simon Rosenthal


if the  BASH environment variable 'CDPATH' is set it causes a change in 
behavior to the 'cd' command such that it will print out the directory which is 
the target of the 'cd' command. This breaks the bin/solr script at
 
SOLR_TIP=`cd $SOLR_TIP; pwd`
DEFAULT_SERVER_DIR=$SOLR_TIP/server

As a result, SOLR_TIP will then contain the directory name twice with 
disastrous results subsequently.
Fix is to add 

UNSET CDPATH 

early in the script



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-22 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976946#comment-13976946
 ] 

Simon Rosenthal commented on SOLR-5871:
---

+1 to add this functionality - at least an enumeration of the fields where 
there were matches. I could live without having the matching terms.





 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2013-12-05 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840622#comment-13840622
 ] 

Simon Rosenthal commented on SOLR-4722:
---

I also needed to remove the line for the unnecessary import

import org.apache.lucene.index.StoredDocument

as that class is not in Solr 4.4. After that, it worked like a charm.

 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, 5.0
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, SOLR-4722.patch, 
 solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2013-11-14 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822537#comment-13822537
 ] 

Simon Rosenthal commented on SOLR-4722:
---

Great patch !

I'd like to use the code as the basis for a component which will simply return 
term positions for each query term - no need for having highlighting enabled as 
a prerequisite, or to return term offsets - this is a text mining project where 
we'll be running queries in batch mode and storing this information externally. 

Can you think of any gotchas I might encounter ?

 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, 5.0
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2013-11-14 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822893#comment-13822893
 ] 

Simon Rosenthal commented on SOLR-4722:
---

Just one oddity - there are references to the StoredDocument class in 
getUniqueKeys() which (as far as I can see) is only in trunk - and I'm using 
Lucene/Solr 4.5.1. I replaced that with Document =  which compiles OK, but I 
haven't had a chance to try it out yet. Do you think it should work ?  

-Simon




On Thursday, November 14, 2013 12:21 PM, Tricia Jenkins (JIRA) 
j...@apache.org wrote:
 

    [ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822636#comment-13822636
 ] 

Tricia Jenkins commented on SOLR-4722:
--

Thanks for your interest.  This code/jar could be used as is for your purposes.

If you don't want to specify highlighting enabled in each query just move it to 
conf/solrconfig.xml:
{code:xml}
  requestHandler name=standard class=solr.StandardRequestHandler
    lst name=defaults
      str name=hltrue/str
    /lst
  /requestHandler
{code}

This highlighter only returns the term positions.  The term offsets are stored 
because they're used by the FastVectorHighlighter.  You won't get any useful 
information from this highlighter if you disable termOffsets in your schema.xml.

I just ran this patch against trunk.  Still works!




--
This message was sent by Atlassian JIRA
(v6.1#6144)


 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, 5.0
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-12 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293761#comment-13293761
 ] 

Simon Rosenthal commented on SOLR-3535:
---

Mikhail:
not clear to me from the code/comments exactly what this issue/patch is meant 
to accomplish. I'm assuming that the intention is to be able to add atomically 
every document in the block at once ?

That is a use case which I have encountered  (a batch update of a set of 
records with new product price information, where you want to commit them only 
when the complete set has been indexed, regardless of autocommits being fired 
off or other processes issuing commits). If that's the intention, this patch is 
great !

I attempted to address the problem of undesired autocommits  in SOLR-2664 - 
enable/disable autocommit on the fly, but that patch is very out of date.

I do think it should be extended to updates in CSV/JSON and updates using the 
SolrJ API.

+1 for Erik's suggestion on the syntax.



 Add block support for XMLLoader
 ---

 Key: SOLR-3535
 URL: https://issues.apache.org/jira/browse/SOLR-3535
 Project: Solr
  Issue Type: Sub-task
  Components: update
Affects Versions: 4.1, 5.0
Reporter: Mikhail Khludnev
Priority: Minor
 Attachments: SOLR-3535.patch


 I'd like to add the following update xml message:
 add-block
 doc/doc
 doc/doc
 /add-block
 out of scope for now: 
 * other update formats
 * update log support (NRT), should not be a big deal
 * overwrite feature support for block updates - it's more complicated, I'll 
 tell you why
 Alt
 * wdyt about adding attribute to the current tag {pre}add block=true{pre} 
 * or we can establish RunBlockUpdateProcessor which treat every add 
 /add as a block.
 *Test is included!!*
 How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-2703:
--

Attachment: SOLR-2703.patch

New patch. Query parser not registered by default, and a commented out entry in 
example solrconfig was added.

Hopefully ready to commit

 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098106#comment-13098106
 ] 

Simon Rosenthal commented on SOLR-2703:
---

Wiki page to follow  at http://wiki.apache.org/solr/SurroundQueryParser

 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser

2011-09-06 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098133#comment-13098133
 ] 

Simon Rosenthal commented on SOLR-2703:
---

Should hold up on the commit until 
https://issues.apache.org/jira/browse/LUCENE-2945 patch has been committed, 
otherwise query caching is very broken. I updated the patch for that issue to 
work with trunk a few weeks ago.

 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Assignee: Erik Hatcher
Priority: Minor
 Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound

2011-08-25 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091187#comment-13091187
 ] 

Simon Rosenthal commented on SOLR-2731:
---

In addition to loading CSV results into a spreadsheet, I often use CSV as a 
quick-and-dirty way of dumping the contents of an index to be re-read into 
Solr, and adding lines which would need manual removal would be rather 
inconvenient.

I'd go for option 4, with the comment symbol and result metadata on one line. 
org.apache.commons.csv has an option (which is not currently enabled in the 
CSVRequestHandler) to recognize and discard comment lines - adding a request 
parameter to the handler to recognize comment lines would be straightforward, 
and would at least solve my use case, though I admit not all others.



 

 CSVResponseWriter should optionally return numfound
 ---

 Key: SOLR-2731
 URL: https://issues.apache.org/jira/browse/SOLR-2731
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 3.1, 3.3, 4.0
Reporter: Jon Hoffman
  Labels: patch
 Fix For: 3.1.1, 3.3, 4.0

 Attachments: SOLR-2731.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 an optional parameter csv.numfound=true can be added to the request which 
 causes the first line of the response to be the numfound.  This would have no 
 impact on existing behavior, and those who are interested in that value can 
 simply read off the first line before sending to their usual csv parser.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2731) CSVResponseWriter should optionally return numfound

2011-08-25 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091227#comment-13091227
 ] 

Simon Rosenthal commented on SOLR-2731:
---

good point. In that case I'm agnostic - 1) would be fine.

 CSVResponseWriter should optionally return numfound
 ---

 Key: SOLR-2731
 URL: https://issues.apache.org/jira/browse/SOLR-2731
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 3.1, 3.3, 4.0
Reporter: Jon Hoffman
  Labels: patch
 Fix For: 3.1.1, 3.3, 4.0

 Attachments: SOLR-2731.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 an optional parameter csv.numfound=true can be added to the request which 
 causes the first line of the response to be the numfound.  This would have no 
 impact on existing behavior, and those who are interested in that value can 
 simply read off the first line before sending to their usual csv parser.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1725) Script based UpdateRequestProcessorFactory

2011-08-24 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-1725:
--

Attachment: SOLR-1725-rev1.patch

With the hope that this can be committed to trunk soon, I updated the patch to 
work with reorganized sources in trunk, and a couple of other small changes so 
that the tests would compile
.
Some tests fail - I'm seeing
[junit] ERROR: SolrIndexSearcher opens=30 closes=28
[junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher 
opens=30 closes=28

in the ScriptUpdateProcessorFactoryTest 



 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2664) Disable/enable autocommit on the fly

2011-08-23 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089690#comment-13089690
 ] 

Simon Rosenthal commented on SOLR-2664:
---

In working on extending the functionality to SolrJ/CSV/JSON, I've seen some 
shortcomings and ambiguities in the initial API I was working on.

here's a clearer approach:

The anticipated use case would be to disable autocommits temporarily during 
update processing from a content stream (or from a server.add() in solrJ). So 
for /update handlers where a content stream is specified, append the parameter 
deferAutoCommit=true to the URL. Autocommits will be disabled while the 
content stream is processed, and automatically re-enabled at the end of 
processing (regardless of success or failure). This will also be a recognized 
attribute in an add element for solr XML processing.

Additionally, no content stream is specified,  one can specify the single 
operations /update?disableAutoCommit=true and subsequently 
/update/enableAutoCommit=true, in the same way as you can specify commit as the 
only parameter. However  you're on your own if you do the one without the 
other

SolrJ will have new server#disableAutoCommit and server#enableAutocommit 
methods, and also  add(doc, boolean deferAutoCommit).








 Disable/enable autocommit on the fly
 

 Key: SOLR-2664
 URL: https://issues.apache.org/jira/browse/SOLR-2664
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2664.patch


 There are occasions when although autocommit is configured, it would be 
 desirable to disable it temporarily - for instance when batch adding/updating 
 a set of documents which should be committed atomically (e.g. a set of price 
 changes).
 The patch adds disableAutoCommit/ and /enableAutoCommit commands to 
 XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to 
 the add element - this will disable autocommit until the terminating /add 
 at the end of the XML document is reached.
 At present, the autocommit state will not survive a core reload.
 It should be possible to extend this functionality to SolrJ, CSVUpdatehandler 
 ( and JSONUpdateHandler ?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2664) Disable/enable autocommit on the fly

2011-08-23 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089783#comment-13089783
 ] 

Simon Rosenthal commented on SOLR-2664:
---

bq. Rather than disabling autoCommit globally (for all clients), shouldn't it 
just disable it for that particular client? A different client may be adding 
some time sensitive documents.

That's a much better approach! I like the batch parameter. Is it in fact now 
possible to autocommit (or not, in this case) only for a particular content 
stream/batch, when multiple ones are being indexed simultaneoulsy ?  (my 
understanding has always been that commits/autocommits were global in effect).



 Disable/enable autocommit on the fly
 

 Key: SOLR-2664
 URL: https://issues.apache.org/jira/browse/SOLR-2664
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2664.patch


 There are occasions when although autocommit is configured, it would be 
 desirable to disable it temporarily - for instance when batch adding/updating 
 a set of documents which should be committed atomically (e.g. a set of price 
 changes).
 The patch adds disableAutoCommit/ and /enableAutoCommit commands to 
 XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to 
 the add element - this will disable autocommit until the terminating /add 
 at the end of the XML document is reached.
 At present, the autocommit state will not survive a core reload.
 It should be possible to extend this functionality to SolrJ, CSVUpdatehandler 
 ( and JSONUpdateHandler ?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser

2011-08-11 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-2703:
--

Attachment: SOLR-2703.patch

Updated patch - adds a test suite, support for configuring maxBasicQueries in 
the URL, and configureing as one of the default parsers

 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Attachments: SOLR-2703.patch, SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2709) processing a synonym in a token stream will remove the following token from the stream

2011-08-11 Thread Simon Rosenthal (JIRA)
processing a synonym in a token stream will remove the following token from the 
stream
--

 Key: SOLR-2709
 URL: https://issues.apache.org/jira/browse/SOLR-2709
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0
 Environment: solr 4.0 built from trunk today
 
Solr Specification Version: 4.0.0.2011.08.11.16.00.45
Solr Implementation Version: 4.0-SNAPSHOT 1156711M - simon - 2011-08-11 
16:00:45
Lucene Specification Version: 4.0-SNAPSHOT
Lucene Implementation Version: 4.0-SNAPSHOT 1156556 - simon - 
2011-08-11 09:33:46
Current Time: Thu Aug 11 21:27:11 EDT 2011
Server Start Time:Thu Aug 11 20:59:11 EDT 2011

Reporter: Simon Rosenthal


If you do a phrase search on a field derived from a fieldtype with the synonym 
filter which includes a synonym, the term following the synonym vanishes after 
synonym expansion.

e.g. 
http://host:port/solr/corename/select/?q=desc:%22xyzzy%20%20bbb%20pot%20of%20gold%22version=2.2start=0rows=10indent=ondebugQuery=true
   (bbb is in the default synonyms file, desc is a text fieldtype)

outputs

str name=rawquerystringdesc:xyzzy  bbb pot of gold/str
str name=querystringdesc:xyzzy  bbb pot of gold/str
str name=parsedqueryPhraseQuery(desc:xyzzy  1  2 of gold)/str
str name=parsedquery_toStringdesc:xyzzy  1  2 of gold/str


You can also see this behavior using the admin console analysis.jsp

Solr 3.3 behaves properly.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser

2011-08-10 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082569#comment-13082569
 ] 

Simon Rosenthal commented on SOLR-2703:
---


+1 on eventually adding analysis support to the parser. 

The default (1024) for maxBasicQueries seems more than adequate but it wouldn't 
hurt to have it as a parameter.

I found that a field simply analyzed with lowercasing and the 
EnglishPluralStemmer gave decent results, and wildcard searches using the base 
form of the term will mostly compensate for lack of stemming support - all this 
can be documented in the javadocs and the Wiki.

 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Attachments: SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-08-09 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081893#comment-13081893
 ] 

Simon Rosenthal commented on LUCENE-2945:
-

Paul -
can you refactor the 2945d patch so that it will apply cleanly to the 
reorganized source tree, as the surround parser is now under modules/ ?

 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-08-09 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated LUCENE-2945:


Attachment: LUCENE-2945e.patch

LUCENE-29453.patch uploaded.

modified from 2945d as you suggested, and it applied cleanly to 08/08 nightly 
build

 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch, LUCENE-2945e.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-08-09 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated LUCENE-2945:


Attachment: LUCENE-2945e.patch

revised patch -- needed changes package statements for few files. Applies and 
compiles cleanly now.

 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
 LUCENE-2945d.patch, LUCENE-2945e.patch, LUCENE-2945e.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2703) Add support for the Lucene Surround Parser

2011-08-09 Thread Simon Rosenthal (JIRA)
Add support for the Lucene Surround Parser
--

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor


The Lucene/contrib surround parser provides support for span queries. This 
issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2703) Add support for the Lucene Surround Parser

2011-08-09 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-2703:
--

Attachment: SOLR-2703.patch

initial patch. No tests yet.


 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Attachments: SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2703) Add support for the Lucene Surround Parser

2011-08-09 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081971#comment-13081971
 ] 

Simon Rosenthal edited comment on SOLR-2703 at 8/9/11 10:28 PM:


The most recent LUCENE-2945 patch is needed so that surround queries can be 
properly cached

  was (Author: simon.rosenthal):
The most recent patch is needed so that surround queries can be properly 
cached
  
 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Attachments: SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2667) Finish Solr Admin UI

2011-07-27 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071897#comment-13071897
 ] 

Simon Rosenthal commented on SOLR-2667:
---

The status displayed for DIH indexing is not as detailed as that on the old 
page- I prefer the elapsed time in more precision, rather than  'n minutes ago'

Since you're doing a status request every few seconds, would it be possible to 
add metrics such as 'documents processed per second' ?' (either for the last 
few seconds, or since the start of the import, or both)

 Finish Solr Admin UI
 

 Key: SOLR-2667
 URL: https://issues.apache.org/jira/browse/SOLR-2667
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 4.0

 Attachments: SOLR-2667-110722.patch


 In SOLR-2399, we added a new admin UI. The issue has gotten too long to 
 follow, so this is a new issue to track remaining tasks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1032) CSV loader to support literal field values

2011-07-25 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-1032:
--

Attachment: SOLR-1032.patch

here's a first cut at a patch. The syntax for the literal field is 
f.fieldname.literal=literalvalue

The supplied literal value is not processed in any way (e.g. split into multi 
values, quotes removed)

No tests yet.

 CSV loader to support literal field values
 --

 Key: SOLR-1032
 URL: https://issues.apache.org/jira/browse/SOLR-1032
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Erik Hatcher
Priority: Minor
 Attachments: SOLR-1032.patch


 It would be very handy if the CSV loader could handle a literal field 
 mapping, like the extracting request handler does.  For example, in a 
 scenario where you have multiple datasources (some data from a DB, some from 
 file crawls, and some from CSV) it is nice to add a field to every document 
 that specifies the data source.  This is easily done with DIH with a template 
 transformer, and Solr Cell with ext.literal.datasource=, but impossible 
 currently with CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1032) CSV loader to support literal field values

2011-07-25 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070939#comment-13070939
 ] 

Simon Rosenthal commented on SOLR-1032:
---

Patch is for 4.0

 CSV loader to support literal field values
 --

 Key: SOLR-1032
 URL: https://issues.apache.org/jira/browse/SOLR-1032
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Erik Hatcher
Priority: Minor
 Attachments: SOLR-1032.patch


 It would be very handy if the CSV loader could handle a literal field 
 mapping, like the extracting request handler does.  For example, in a 
 scenario where you have multiple datasources (some data from a DB, some from 
 file crawls, and some from CSV) it is nice to add a field to every document 
 that specifies the data source.  This is easily done with DIH with a template 
 transformer, and Solr Cell with ext.literal.datasource=, but impossible 
 currently with CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2664) Disable/enable autocommit on the fly

2011-07-18 Thread Simon Rosenthal (JIRA)
Disable/enable autocommit on the fly


 Key: SOLR-2664
 URL: https://issues.apache.org/jira/browse/SOLR-2664
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Fix For: 4.0


There are occasions when although autocommit is configured, it would be 
desirable to disable it temporarily - for instance when batch adding/updating a 
set of documents which should be committed atomically (e.g. a set of price 
changes).

The patch adds disableAutoCommit/ and /enableAutoCommit commands to 
XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to the 
add element - this will disable autocommit until the terminating /add at 
the end of the XML document is reached.

At present, the autocommit state will not survive a core reload.

It should be possible to extend this functionality to SolrJ, CSVUpdatehandler ( 
and JSONUpdateHandler ?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2664) Disable/enable autocommit on the fly

2011-07-18 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-2664:
--

Attachment: SOLR-2664.patch

Patch added

 Disable/enable autocommit on the fly
 

 Key: SOLR-2664
 URL: https://issues.apache.org/jira/browse/SOLR-2664
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0
Reporter: Simon Rosenthal
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2664.patch


 There are occasions when although autocommit is configured, it would be 
 desirable to disable it temporarily - for instance when batch adding/updating 
 a set of documents which should be committed atomically (e.g. a set of price 
 changes).
 The patch adds disableAutoCommit/ and /enableAutoCommit commands to 
 XMLUpdateHandler, and also adds a disableAutoCommit=true|false attribute to 
 the add element - this will disable autocommit until the terminating /add 
 at the end of the XML document is reached.
 At present, the autocommit state will not survive a core reload.
 It should be possible to extend this functionality to SolrJ, CSVUpdatehandler 
 ( and JSONUpdateHandler ?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2580) Create a new Search Component to alter queries based on business rules.

2011-06-08 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046222#comment-13046222
 ] 

Simon Rosenthal commented on SOLR-2580:
---

Tomas:l
I'm not sure why you would want to encapsulate these kind of rules within Solr 
- an e-commerce site would always have an application layer between the UI and 
Solr which seems like the logical place to  apply business rules leading to 
modifying the request by adding boosts, specifying sort order, etc. 

Also, is Drools separate from JBoss (which is used relatively in frequently in 
the Solr community) ?


 Create a new Search Component to alter queries based on business rules. 
 

 Key: SOLR-2580
 URL: https://issues.apache.org/jira/browse/SOLR-2580
 Project: Solr
  Issue Type: New Feature
Reporter: Tomás Fernández Löbbe

 The goal is to be able to adjust the relevance of documents based on user 
 defined business rules.
 For example, in a e-commerce site, when the user chooses the shoes 
 category, we may be interested in boosting products from a certain brand. 
 This can be expressed as a rule in the following way:
 rule Boost Adidas products when searching shoes
 when
 $qt : QueryTool()
 TermQuery(term.field==category, term.text==shoes)
 then
 $qt.boost({!lucene}brand:adidas);
 end
 The QueryTool object should be used to alter the main query in a easy way. 
 Even more human-like rules can be written:
 rule Boost Adidas products when searching shoes
  when
 Query has term shoes in field product
  then
 Add boost query {!lucene}brand:adidas
 end
 These rules are written in a text file in the config directory and can be 
 modified at runtime. Rules will be managed using JBoss Drools: 
 http://www.jboss.org/drools/drools-expert.html
 On a first stage, it will allow to add boost queries or change sorting fields 
 based on the user query, but it could be extended to allow more options.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-18 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983215#action_12983215
 ] 

Simon Rosenthal commented on SOLR-445:
--

bq.  Don't allow autocommits during an update. Simple. Or, rather, all update 
requests block at the beginning during an autocommit. If an update request has 
too many documents, don't do so many documents in an update. (Lance)
Lance - How do you (dynamically ) disable autocommits during a specific update  
? That functionality would also be useful in other use cases, but that's 
another issue). 

bq. NOTE: This does change the behavior of Solr. Without this patch, the first 
document that is incorrect stops processing. Now, it continues merrily on 
adding documents as it can. Is this desirable behavior? It would be easy to 
abort on first error if that's the consensus, and I could take some tedious 
record-keeping out. I think there's no big problem with continuing on, since 
the state of committed documents is indeterminate already when errors occur so 
worrying about this should be part of a bigger issue.

I think it should be an option, if possible. I can see use cases where 
abort-on-first-error is desirable, but also situations where you know one or 
two documents may be erroneous, and its worth continuing on in order to index 
the other 99%


 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-06-19 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880490#action_12880490
 ] 

Simon Rosenthal commented on SOLR-1911:
---

No - seems to have cleared up with trunk also,.

I'm OK with closing it but am really curious to know what changed between mid 
May and today to clear up the problem.

 File descriptor leak while indexing, may cause index corruption
 ---

 Key: SOLR-1911
 URL: https://issues.apache.org/jira/browse/SOLR-1911
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.5
 Environment: Ubuntu Linux, Java build 1.6.0_16-b01
 Solr Specification Version: 3.0.0.2010.05.12.16.17.46
   Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
 16:17:46  -- bult from updated trunk
   Lucene Specification Version: 4.0-dev
   Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
   Current Time: Thu May 13 12:21:12 EDT 2010
   Server Start Time:Thu May 13 11:45:41 EDT 2010
Reporter: Simon Rosenthal
Priority: Critical
 Attachments: indexlsof.tar.gz, openafteropt.txt


 While adding documents to an already existing index using this build, the 
 number of open file descriptors increases dramatically until the open file 
 per-process limit is reached (1024) , at which point there are error messages 
 in the log to that effect. If the server is restarted the index may be corrupt
 commits are handled by autocommit every 60 seconds or 500 documents (usually 
 the time limit is reached first). 
 mergeFactor is 10.
 It looks as though each time a commit takes place, the number of open files  
 (obtained from  lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 
 40, There are several open file descriptors associated with each file in the 
 index.
 Rerunning the same index updates with an older Solr (built from trunk in Feb 
 2010) doesn't show this problem - the number of open files fluctuates up and 
 down as segments are created and merged, but stays basically constant.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-05-17 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-1911:
--

Attachment: indexlsof.tar.gz

OK.. I built from latest trunk, used the schema associated with the index and 
example solrconfig.xml, as you asked.

- Started with a snapshot of the index taken before this issue reared its head

- used post.sh to add a file with around 800 documents (different one each time)
- did a commit (no autocommit)
- did an lsof on the process 

repeated the add/commit/lsof  5 times.

The attached tarball contains the lsof outputs, and we're still seeing the 
number of fds incrementing by 38-40 after each commit. I didn't go to the 
bitter end, but I assume we'd get there...

Here's a clue -?? I looked for file descriptors associated with one .prx file 
that was present in the original snapshot in each lsof output

grep -c _r8.prx lsof.*
lsof.0:1
lsof.1:2
lsof.2:3
lsof.3:4
lsof.4:5
lsof.5:6

The .frq files seem to have the same pattern.

I'm assuming that's not good...


 File descriptor leak while indexing, may cause index corruption
 ---

 Key: SOLR-1911
 URL: https://issues.apache.org/jira/browse/SOLR-1911
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.5
 Environment: Ubuntu Linux, Java build 1.6.0_16-b01
 Solr Specification Version: 3.0.0.2010.05.12.16.17.46
   Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
 16:17:46  -- bult from updated trunk
   Lucene Specification Version: 4.0-dev
   Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
   Current Time: Thu May 13 12:21:12 EDT 2010
   Server Start Time:Thu May 13 11:45:41 EDT 2010
Reporter: Simon Rosenthal
Priority: Critical
 Attachments: indexlsof.tar.gz, openafteropt.txt


 While adding documents to an already existing index using this build, the 
 number of open file descriptors increases dramatically until the open file 
 per-process limit is reached (1024) , at which point there are error messages 
 in the log to that effect. If the server is restarted the index may be corrupt
 commits are handled by autocommit every 60 seconds or 500 documents (usually 
 the time limit is reached first). 
 mergeFactor is 10.
 It looks as though each time a commit takes place, the number of open files  
 (obtained from  lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 
 40, There are several open file descriptors associated with each file in the 
 index.
 Rerunning the same index updates with an older Solr (built from trunk in Feb 
 2010) doesn't show this problem - the number of open files fluctuates up and 
 down as segments are created and merged, but stays basically constant.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-05-15 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867880#action_12867880
 ] 

Simon Rosenthal commented on SOLR-1911:
---

bq: 1) what mechanism are you using to index content? ie: POSTing XML from a 
remote client? using the stream.url or stream.file params? Using SolrCell? 
using DIH? (and if you are using DIH, from what source? DB? HTTP? File? .. and 
with what transformers?)

   posting XML from a local client, not using stream.url or stream.file

2) what files does lsof show are open after each successive commit until the 
limit is reached? seeing how the file list grows - specifically which files are 
never getting closed - over time is really the only way to track down what code 
isn't closing files

will attach lsof output taken after it reached the limit

 File descriptor leak while indexing, may cause index corruption
 ---

 Key: SOLR-1911
 URL: https://issues.apache.org/jira/browse/SOLR-1911
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.5
 Environment: Ubuntu Linux, Java build 1.6.0_16-b01
 Solr Specification Version: 3.0.0.2010.05.12.16.17.46
   Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
 16:17:46  -- bult from updated trunk
   Lucene Specification Version: 4.0-dev
   Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
   Current Time: Thu May 13 12:21:12 EDT 2010
   Server Start Time:Thu May 13 11:45:41 EDT 2010
Reporter: Simon Rosenthal
Priority: Critical

 While adding documents to an already existing index using this build, the 
 number of open file descriptors increases dramatically until the open file 
 per-process limit is reached (1024) , at which point there are error messages 
 in the log to that effect. If the server is restarted the index may be corrupt
 commits are handled by autocommit every 60 seconds or 500 documents (usually 
 the time limit is reached first). 
 mergeFactor is 10.
 It looks as though each time a commit takes place, the number of open files  
 (obtained from  lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 
 40, There are several open file descriptors associated with each file in the 
 index.
 Rerunning the same index updates with an older Solr (built from trunk in Feb 
 2010) doesn't show this problem - the number of open files fluctuates up and 
 down as segments are created and merged, but stays basically constant.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-05-15 Thread Simon Rosenthal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Rosenthal updated SOLR-1911:
--

Attachment: openafteropt.txt

lsof output after the error occurred

 File descriptor leak while indexing, may cause index corruption
 ---

 Key: SOLR-1911
 URL: https://issues.apache.org/jira/browse/SOLR-1911
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.5
 Environment: Ubuntu Linux, Java build 1.6.0_16-b01
 Solr Specification Version: 3.0.0.2010.05.12.16.17.46
   Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
 16:17:46  -- bult from updated trunk
   Lucene Specification Version: 4.0-dev
   Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
   Current Time: Thu May 13 12:21:12 EDT 2010
   Server Start Time:Thu May 13 11:45:41 EDT 2010
Reporter: Simon Rosenthal
Priority: Critical
 Attachments: openafteropt.txt


 While adding documents to an already existing index using this build, the 
 number of open file descriptors increases dramatically until the open file 
 per-process limit is reached (1024) , at which point there are error messages 
 in the log to that effect. If the server is restarted the index may be corrupt
 commits are handled by autocommit every 60 seconds or 500 documents (usually 
 the time limit is reached first). 
 mergeFactor is 10.
 It looks as though each time a commit takes place, the number of open files  
 (obtained from  lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 
 40, There are several open file descriptors associated with each file in the 
 index.
 Rerunning the same index updates with an older Solr (built from trunk in Feb 
 2010) doesn't show this problem - the number of open files fluctuates up and 
 down as segments are created and merged, but stays basically constant.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-05-13 Thread Simon Rosenthal (JIRA)
File descriptor leak while indexing, may cause index corruption
---

 Key: SOLR-1911
 URL: https://issues.apache.org/jira/browse/SOLR-1911
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.5
 Environment: Ubuntu Linux, Java build 1.6.0_16-b01
Solr Specification Version: 3.0.0.2010.05.12.16.17.46
Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
16:17:46  -- bult from updated trunk
Lucene Specification Version: 4.0-dev
Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
Current Time: Thu May 13 12:21:12 EDT 2010
Server Start Time:Thu May 13 11:45:41 EDT 2010
Reporter: Simon Rosenthal
Priority: Critical


While adding documents to an already existing index using this build, the 
number of open file descriptors increases dramatically until the open file 
per-process limit is reached (1024) , at which point there are error messages 
in the log to that effect. If the server is restarted the index may be corrupt

the solr log reports:

May 13, 2010 12:37:04 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
May 13, 2010 12:37:04 PM 
org.apache.solr.update.DirectUpdateHandler2$CommitTracker run
SEVERE: auto commit error...
java.io.FileNotFoundException: /home/simon/rig2/solr/core1/data/index/_j2.nrm 
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:69)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:90)
at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:80)
at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67)
at 
org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java:1093)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:532)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:634)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:610)
at 
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1012)
at 
org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4563)
at 
org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3775)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3623)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3614)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1769)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1732)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1696)
at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230)
at 
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
at 
org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
May 13, 2010 12:37:04 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {} 0 1
May 13, 2010 12:37:04 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: directory '/home/simon/rig2/solr/core1/data/index' 
exists and is a directory, but cannot be listed: list() returned null
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:223)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:234)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:582)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:316)
at 

[jira] Commented: (SOLR-1750) SystemStatsRequestHandler - replacement for stats.jsp

2010-02-03 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829354#action_12829354
 ] 

Simon Rosenthal commented on SOLR-1750:
---


+1 on SolrStatsRequestHandler

You might want to consider either omitting or making optional the Lucene 
Fieldcache stats; they can often be *very* slow to be generated  ( see  
http://www.lucidimagination.com/search/document/5ba908577d2e4c25/stats_page_slow_in_latest_nightly#2f40166c25f9bfa0
 ). One use case for this request handler that I can see is high frequency 
(every few seconds) monitoring as part of performance testing,  for which a 
fast response is pretty mandatory.



 SystemStatsRequestHandler - replacement for stats.jsp
 -

 Key: SOLR-1750
 URL: https://issues.apache.org/jira/browse/SOLR-1750
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Trivial
 Fix For: 1.5

 Attachments: SystemStatsRequestHandler.java


 stats.jsp is cool and all, but suffers from escaping issues, and also is not 
 accessible from SolrJ or other standard Solr APIs.
 Here's a request handler that emits everything stats.jsp does.
 For now, it needs to be registered in solrconfig.xml like this:
 {code}
 requestHandler name=/admin/stats 
 class=solr.SystemStatsRequestHandler /
 {code}
 But will register this in AdminHandlers automatically before committing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1603) Perl Response Writer

2010-01-29 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806586#action_12806586
 ] 

Simon Rosenthal commented on SOLR-1603:
---


The patch installed fine.

+1 for committing it.

the output is a complex Perl data structure with search results which would 
presumably immediately be assigned to a variable - not eval'd.  Absolutely 
agree with Erik and Yonik - I can't think of a realistic case in which this 
would present a security risk.

 Perl Response Writer
 

 Key: SOLR-1603
 URL: https://issues.apache.org/jira/browse/SOLR-1603
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Reporter: Claudio Valente
Priority: Minor
 Attachments: SOLR-1603.2.patch, SOLR-1603.patch


 I've made a patch that implements a Perl response writer for Solr.
 It's nan/inf and unicode aware.
 I don't know whether some fields can be binary but if so I can probably 
 extend it to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1603) Perl Response Writer

2010-01-28 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806097#action_12806097
 ] 

Simon Rosenthal commented on SOLR-1603:
---


Couldn't build Solr with the current patch (against trunk). probably because 
the package for ResponseWriters was changed recently to 
org.apache.solr.response;

 Perl Response Writer
 

 Key: SOLR-1603
 URL: https://issues.apache.org/jira/browse/SOLR-1603
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Reporter: Claudio Valente
Priority: Minor
 Attachments: SOLR-1603.patch


 I've made a patch that implements a Perl response writer for Solr.
 It's nan/inf and unicode aware.
 I don't know whether some fields can be binary but if so I can probably 
 extend it to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1603) Perl Response Writer

2009-11-25 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782605#action_12782605
 ] 

Simon Rosenthal commented on SOLR-1603:
---

I'd be curious to know what the use case is for this - I've used the both JSON 
and XMLWriters to write results for a Perl web application with few problems.



 Perl Response Writer
 

 Key: SOLR-1603
 URL: https://issues.apache.org/jira/browse/SOLR-1603
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Reporter: Claudio Valente
 Attachments: SOLR-1603.patch


 I've made a patch that implements a Perl response writer for Solr.
 It's nan/inf and unicode aware.
 I don't know whether some fields can be binary but if so I can probably 
 extend it to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1509) Admin UI display of schema.xml can't find schema file

2009-10-13 Thread Simon Rosenthal (JIRA)
Admin UI display of schema.xml can't find schema file
-

 Key: SOLR-1509
 URL: https://issues.apache.org/jira/browse/SOLR-1509
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Simon Rosenthal
Priority: Minor



This is in a multicore enviroment:   solr.xml contains

solr sharedLib=lib shareSchema=true persistent=true
cores adminPath=/admin/cores
core default=true instanceDir=core1 name=core1
 config=/home/solrdata/rig1/conf/solrconfig.xml 
 schema=/home/solrdata/rig1/conf/schema.xml 
property name=dataDir value=/home/solrdata/rig1/core1 /
/core
core default=false instanceDir=core2 name=core2
... schema same as above
/core
...
/cores

When I go to the URL /solr/core1/admin/ and click on the schema link, 
the URL displayed by the browser is 


http://host:port/solr/core1/admin/file/?file=/home/solrdata/rig1/conf/schema.xml

which looks correct,  but a HTTP 400 error is displayed in of the form

Can not find: schema.xml 
[/path/to/core1/conf/directory/home/solrdata/rig1/conf/schema.xml]

looks as though it's blindly appending the schema.xml path to the conf 
directory, even though it's not a relative one. 
Same for other cores

   The schema browser link on the admin page works fine.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.