from:"Mike Klaas"


[ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12629981#action_12629981
 ] 

Mike Klaas commented on SOLR-216:
-

That's great!  Be sure to update http://wiki.apache.org/solr/SolPython as the 
project progresses.



 Improvements to solr.py
 ---

 Key: SOLR-216
 URL: https://issues.apache.org/jira/browse/SOLR-216
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
Affects Versions: 1.2
Reporter: Jason Cater
Assignee: Mike Klaas
Priority: Trivial
 Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, 
 solr.py, test_all.py


 I've taken the original solr.py code and extended it to include higher-level 
 functions.
   * Requires python 2.3+
   * Supports SSL (https://) schema
   * Conforms (mostly) to PEP 8 -- the Python Style Guide
   * Provides a high-level results object with implicit data type conversion
   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-766) Remove python client from 1.3 distribution

Remove python client from 1.3 distribution
--

 Key: SOLR-766
 URL: https://issues.apache.org/jira/browse/SOLR-766
 Project: Solr
  Issue Type: Task
  Components: clients - python
Affects Versions: 1.3
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Blocker
 Fix For: 1.3


see solr-dev thread:

http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL 
PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-766) Remove python client from 1.3 distribution


[ 
https://issues.apache.org/jira/browse/SOLR-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12630004#action_12630004
 ] 

Mike Klaas commented on SOLR-766:
-

JIRA seems to be not allowing me to upload a patch.  Here is the text of the 
proposed README:

Note: As of version 1.3, Solr no longer comes bundled with a Python client.  
The existing client
was not sufficiently maintained or tested as development of Solr progressed, 
and committers
felt that the code was not up to our usual high standards of release.

The client bundled with previous versions of Solr will continue to be available 
indefinitely at:
http://svn.apache.org/viewvc/lucene/solr/tags/release-1.2.0/client/python/

Please see http://wiki.apache.org/solr/SolPython for information on third-party 
Solr python
clients.



 Remove python client from 1.3 distribution
 --

 Key: SOLR-766
 URL: https://issues.apache.org/jira/browse/SOLR-766
 Project: Solr
  Issue Type: Task
  Components: clients - python
Affects Versions: 1.3
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Blocker
 Fix For: 1.3


 see solr-dev thread:
 http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-766) Remove python client from 1.3 distribution


 [ 
https://issues.apache.org/jira/browse/SOLR-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-766:


Attachment: SOLR-766.patch

 Remove python client from 1.3 distribution
 --

 Key: SOLR-766
 URL: https://issues.apache.org/jira/browse/SOLR-766
 Project: Solr
  Issue Type: Task
  Components: clients - python
Affects Versions: 1.3
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Blocker
 Fix For: 1.3

 Attachments: SOLR-766.patch


 see solr-dev thread:
 http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Solr's use of Lucene's Compression field

2008-09-03 Thread Mike Klaas

Agreed.  It was the simplest thing to do at the time, but it would  
definitely be preferrable to offer the much faster lesser levels of  
compression.


-Mike

On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote:

Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception 
, it occurred to me that we probably should refactor Solr's offering  
of compression.  Currently, we rely on Field.COMPRESS from Lucene,  
but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 
, because it only offers the highest level of compression, which is  
also the slowest.


Obviously, Solr needs to handle the compression on the server side.   
I think we should have Solr do the compression, allowing users to  
set the level of compression (maybe even make it pluggable to put in  
your own compression techniques) and then just use Lucene's binary  
field capability.  Granted, this is lower priority since I doubt  
many people use compression to begin with, but, still it would be  
useful.


-Grant

Re: Solr's use of Lucene's Compression field

2008-09-03 Thread Mike Klaas

Also I see that another Lucene bug (LUCENE-1374) was found relating to
compressed fields in lucene (when we first added compressed field
support to solr a lucene bug involving lazy-loaded fields and
compression was uncovered, too).

It would be good to change the implementation simply to avoid relying
on a deprecated lucene feature that isn't well exercised in development.

-Mike

On 3-Sep-08, at 11:36 AM, Mike Klaas wrote:

Agreed. It was the simplest thing to do at the time, but it would
definitely be preferrable to offer the much faster lesser levels of
compression.

-Mike

On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote:

Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception
, it occurred to me that we probably should refactor Solr's
offering of compression. Currently, we rely on Field.COMPRESS from
Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878
, because it only offers the highest level of compression, which is
also the slowest.

Obviously, Solr needs to handle the compression on the server
side. I think we should have Solr do the compression, allowing
users to set the level of compression (maybe even make it pluggable
to put in your own compression techniques) and then just use
Lucene's binary field capability. Granted, this is lower priority
since I doubt many people use compression to begin with, but, still
it would be useful.

-Grant

[jira] Commented: (SOLR-739) Add support for OmitTf

2008-08-29 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627049#action_12627049
 ] 

Mike Klaas commented on SOLR-739:
-

Haven't looked at the patch, but defaulting to omitTf=true is 
backwards-incompatible (think multi-valued string fields)

 Add support for OmitTf
 --

 Key: SOLR-739
 URL: https://issues.apache.org/jira/browse/SOLR-739
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-739.patch


 Allow setting omitTf in the field schema. Default to true for all but text 
 fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: 1.3 status

2008-08-25 Thread Mike Klaas


+1 for 1.3 RC.

The idea of putting new issues in 1.3.1 has been tossed around a few  
times on this list in the last few weeks.   I'm not sure how other  
people feel about this, but in my mind, 1.X.Y and 1.X.Z releases  
should be feature-identical, with later releases only containing  
bugfixes.  If we have a bunch of cool features we want to release  
shortly, I'd be happy with releasing 1.4 quickly :)


-Mike

On 25-Aug-08, at 7:30 AM, Shalin Shekhar Mangar wrote:


+1 for Lucene upgrade
+1 for a release candidate.

I think the newer issues can make it to 1.3.1 easily. We don't need  
to halt

1.3 for them.

A general question -- how long does a Release Candidate phase lasts?

On Mon, Aug 25, 2008 at 7:51 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


+1 for Lucene upgrade
+1 for a release (I *think* none of the recent SOLR-7** issues have  
to go

in 1.3)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Erik Hatcher [EMAIL PROTECTED]
To: solr-dev@lucene.apache.org
Sent: Monday, August 25, 2008 10:06:46 AM
Subject: Re: 1.3 status


On Aug 25, 2008, at 9:48 AM, Yonik Seeley wrote:

Given that there are backward compat concerns with
https://issues.apache.org/jira/browse/LUCENE-1142
perhaps we should update Lucene again before a release?


+1

   Erik






--
Regards,
Shalin Shekhar Mangar.

Re: [jira] Closed: (LUCENE-1363) sub task of reopen performance

2008-08-22 Thread Mike Klaas


Wow, that was a fast resolution to this issue :)

-Mike

On 22-Aug-08, at 12:46 AM, F.Y. (JIRA) wrote:



[ https://issues.apache.org/jira/browse/LUCENE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
 ]


F.Y. closed LUCENE-1363.


   Resolution: Fixed


sub task of reopen performance
--

   Key: LUCENE-1363
   URL: https://issues.apache.org/jira/browse/LUCENE-1363
   Project: Lucene - Java
Issue Type: Sub-task
   Environment: win
  Reporter: F.Y.




--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (SOLR-474) audit docs for Spellchecker

2008-08-14 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622677#action_12622677
 ] 

Mike Klaas commented on SOLR-474:
-

The issue is more wikidocs vs. behaviour.  I apologize I haven't gotten to this 
yet--I've been suffering from RSI the last month or so and it has been 
difficult to get it non-work computer time.   I'll take a look today.

 audit docs for Spellchecker
 ---

 Key: SOLR-474
 URL: https://issues.apache.org/jira/browse/SOLR-474
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Mike Klaas
 Fix For: 1.3


 according to this troubling comment from Mike, the spellchecker handler 
 javadocs (and wiki) may not reflect reality...
 http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712
 {quote}
 Multi-word spell checking is available only with extendedResults=true, and 
 only in trunk.  I
 believe that the current javadocs are incorrect on this point.
 {quote}
 we should audit/fix this before 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-474) audit docs for Spellchecker

2008-08-14 Thread Mike Klaas (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-474.
-

Resolution: Fixed

I've verified the behaviour and updated the wiki page accordingly.

 audit docs for Spellchecker
 ---

 Key: SOLR-474
 URL: https://issues.apache.org/jira/browse/SOLR-474
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Mike Klaas
 Fix For: 1.3


 according to this troubling comment from Mike, the spellchecker handler 
 javadocs (and wiki) may not reflect reality...
 http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712
 {quote}
 Multi-word spell checking is available only with extendedResults=true, and 
 only in trunk.  I
 believe that the current javadocs are incorrect on this point.
 {quote}
 we should audit/fix this before 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-216) Improvements to solr.py

2008-08-13 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622391#action_12622391
 ] 

Mike Klaas commented on SOLR-216:
-

Hi Dariusz,

There will almost certainly be no more releases of Solr 1.2.  1.3 will likely 
be released in less than a month.  However, it is good that you published this 
code so that it can be found by other parties.

I'd be much more interested in working toward a client that is compatible with 
the upcoming 1.3 release (it is unlikely that it can be included, but it can be 
distributed separately).

cheers,
-Mike

 Improvements to solr.py
 ---

 Key: SOLR-216
 URL: https://issues.apache.org/jira/browse/SOLR-216
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
Affects Versions: 1.2
Reporter: Jason Cater
Assignee: Mike Klaas
Priority: Trivial
 Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, 
 solr.py, test_all.py


 I've taken the original solr.py code and extended it to include higher-level 
 functions.
   * Requires python 2.3+
   * Supports SSL (https://) schema
   * Conforms (mostly) to PEP 8 -- the Python Style Guide
   * Provides a high-level results object with implicit data type conversion
   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: ClientUtils escape query

2008-08-05 Thread Mike Klaas


Wouldn't you want to reverse all escaping in that case anyway?

-Mike

On 5-Aug-08, at 1:45 PM, Grant Ingersoll wrote:

It's mainly a problem when one wants to display the thing later, I  
guess.


-Grant

On Aug 5, 2008, at 4:16 PM, Ryan McKinley wrote:

That came after I spent a week increasing the list of things that  
need escaped one at a time (waiting for errors along the way...)


Erik suggested I look at how the ruby client handles it... and I  
haven't seen any problem since them.


Is there any problem with over escaping?  I know it makes some  
things look funny.  Perhaps there is a regex that will do any non- 
letter except


ryan


On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote:

ClientUtils.escapeQueryChars seems a bit aggressive to me in terms  
of what it escapes.  It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping 
 Special Characters, but doesn't explicitly escape them, instead  
opting for the more general \W regex.  Thus, I'm noticing that  
chars that don't need to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when  
one considers other query parsers, but maybe we should just mark  
this one as explicitly for use w/ the Lucene QP?


-Grant

Re: AutoCommitTest

2008-08-05 Thread Mike Klaas


On 5-Aug-08, at 3:32 PM, Yonik Seeley wrote:


AutoCommitTest was failing for me a good percentage of the time...
the comment suggested that adding another doc after the commit
callback would block until the new searcher was registered.  But
that's not the case.  I've hacked the test for now to just sleep(500)
after the commit callback.


Fair enough.  It is difficult for me to fix this more permenently,  
since I can't get it to fail on local machines.


I deleted a bunch of email recently so I checked nabble--it seems that  
in the last month that AutoCommitTest has failed once in Hudson (July  
21) and once in the apache build (August 2).  That isn't too bad, but  
I hope that your change eliminates those entirely.


-Mike

Re: [jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-29 Thread Mike Klaas



On 29-Jul-08, at 3:20 AM, Andrew Savory wrote:


Actually I'd argue that all such technical discussion would be better
done on the mailing list rather than through JIRA. Mail clients are
designed for threaded discussions far better than JIRA's web GUI. And
JIRA's posting back to the list with bq. makes most responses
impossible to follow. Excessive use of JIRA feels like a community
antipattern to me.


+1

-Mike

[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-28 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617512#action_12617512
 ] 

Mike Klaas commented on SOLR-665:
-

I haven't looked at the proposed code at all, but it _is_ possible to design 
this kind of datastructure, with much care:

http://www.ddj.com/hpc-high-performance-computing/208801974


 FIFO Cache (Unsynchronized): 9x times performance boost
 ---

 Key: SOLR-665
 URL: https://issues.apache.org/jira/browse/SOLR-665
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
 Environment: JRockit R27 (Java 6)
Reporter: Fuad Efendi
 Attachments: FIFOCache.java

   Original Estimate: 672h
  Remaining Estimate: 672h

 Attached is modified version of LRUCache where 
 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
 reordering/true (performance bottleneck of LRU) is replaced to 
 insertion-order/false (so that it became FIFO)
 2. Almost all (absolutely unneccessary) synchronized statements commented out
 See discussion at 
 http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
 Performance metrics (taken from SOLR Admin):
 LRU
 Requests: 7638
 Average Time-Per-Request: 15300
 Average Request-per-Second: 0.06
 FIFO:
 Requests: 3355
 Average Time-Per-Request: 1610
 Average Request-per-Second: 0.11
 Performance increased 9 times which roughly corresponds to a number of CPU in 
 a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
 Current number of documents: 7494689
 name:  filterCache  
 class:org.apache.solr.search.LRUCache  
 version:  1.0  
 description:  LRU Cache(maxSize=1000, initialSize=1000)  
 stats:lookups : 15966954582
 hits : 16391851546
 hitratio : 0.102
 inserts : 4246120
 evictions : 0
 size : 2668705
 cumulative_lookups : 16415839763
 cumulative_hits : 16411608101
 cumulative_hitratio : 0.99
 cumulative_inserts : 4246246
 cumulative_evictions : 0 
 Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-28 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617549#action_12617549
 ] 

Mike Klaas commented on SOLR-665:
-

[quote]We may simply use java.util.concurrent.locks instead of heavy 
synchronized... we may also use Executor framework instead of single-thread 
faceting... We may even base SOLR on Apache MINA project.[/quote]

Simply replacing synchronized with java.util.concurrent.locks doesn't increase 
performance.  There needs to be a specific strategy for employing these locks 
in a way that makes sense.

For instance, one idea would be to create a read/write lock with the put()'s 
covered by write and get()'s covered by read.  This would allow multiple 
parallel reads and will be thread-safe.  Another is to create something like 
ConcurrentLinkedHashMap.

These strategies should be tested before trying to create a lock-free get() 
version, which if even possible, would rely deeply on the implementation (such 
a structure would have to be created from scratch, I believe).  I'd expect 
anyone that is able to create such a thing be familiar enough wiht memory 
barriers and such issues to be able to deeply explain the problems with 
double-checked locking off the top of their head (and immediately see such 
problems in other code)

 FIFO Cache (Unsynchronized): 9x times performance boost
 ---

 Key: SOLR-665
 URL: https://issues.apache.org/jira/browse/SOLR-665
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
 Environment: JRockit R27 (Java 6)
Reporter: Fuad Efendi
 Attachments: FIFOCache.java

   Original Estimate: 672h
  Remaining Estimate: 672h

 Attached is modified version of LRUCache where 
 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
 reordering/true (performance bottleneck of LRU) is replaced to 
 insertion-order/false (so that it became FIFO)
 2. Almost all (absolutely unneccessary) synchronized statements commented out
 See discussion at 
 http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
 Performance metrics (taken from SOLR Admin):
 LRU
 Requests: 7638
 Average Time-Per-Request: 15300
 Average Request-per-Second: 0.06
 FIFO:
 Requests: 3355
 Average Time-Per-Request: 1610
 Average Request-per-Second: 0.11
 Performance increased 9 times which roughly corresponds to a number of CPU in 
 a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
 Current number of documents: 7494689
 name:  filterCache  
 class:org.apache.solr.search.LRUCache  
 version:  1.0  
 description:  LRU Cache(maxSize=1000, initialSize=1000)  
 stats:lookups : 15966954582
 hits : 16391851546
 hitratio : 0.102
 inserts : 4246120
 evictions : 0
 size : 2668705
 cumulative_lookups : 16415839763
 cumulative_hits : 16411608101
 cumulative_hitratio : 0.99
 cumulative_inserts : 4246246
 cumulative_evictions : 0 
 Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-474) audit docs for Spellchecker

2008-07-28 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617580#action_12617580
 ] 

Mike Klaas commented on SOLR-474:
-

I will look at this before release.

 audit docs for Spellchecker
 ---

 Key: SOLR-474
 URL: https://issues.apache.org/jira/browse/SOLR-474
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Mike Klaas
 Fix For: 1.3


 according to this troubling comment from Mike, the spellchecker handler 
 javadocs (and wiki) may not reflect reality...
 http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712
 {quote}
 Multi-word spell checking is available only with extendedResults=true, and 
 only in trunk.  I
 believe that the current javadocs are incorrect on this point.
 {quote}
 we should audit/fix this before 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2008-07-24 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12616729#action_12616729
 ] 

Mike Klaas commented on SOLR-139:
-

[quote]David - storing all data in the search index can be a problem because it 
can get BIG. Imagine if nutch stored the raw content in the lucene index? (I 
may be wrong on this) even with Lazy loading, there is a query time cost to 
having stored fields.[/quote]

Splitting it out into another store is much better at scale.  A distinct lucene 
index works relatively well.



 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Defining properties/using expressions in {multicore, config, schema} files

2008-07-21 Thread Mike Klaas



On 21-Jul-08, at 10:48 AM, Henrib wrote:



I posted a new patch in solr-350 (solr-350-properties.patch) that  
allows
defining properties in multicore.xml and using them in expressions  
in config

 schema files. This brings a lot of flexibility to configuration.

I apologize for doubling the JIRA post; Solr-350 being closed, I  
just wanted
to ensure anyone interested in the feature could try/comment/review/ 
etc.


Perhaps opening a new issue would be best?

cheers,
-Mike

Re: Welcom Shalin Shekhar Mangar

2008-07-20 Thread Mike Klaas


Welcome aboard, Shalin!

-Mike

On 19-Jul-08, at 12:01 PM, Shalin Shekhar Mangar wrote:


Thanks!

I work at AOL in Bangalore as part of a small team which gets to  
work on a
variety of (very cool!) stuff. Though my involvement started when we  
decided
to contribute part of our work to Solr (DataImportHandler), it soon  
became a
personal passion and has remained so since. AOL continues to  
encourage and

support me for which I'm thankful.

I'm very happy to be a part of this community and I'm looking  
forward to

working more closely with you all.

On Sat, Jul 19, 2008 at 1:12 AM, Grant Ingersoll [EMAIL PROTECTED]
wrote:


I am pleased to announce that the Lucene PMC has named Shalin Shekhar
Mangar as a Solr committer.  Shalin has already contributed  
numerous patches

to the community as well as answers and help on the user list.

Shalin, tradition has it that new committers introduce themselves a  
little
bit, so feel free to drop a note about where you work, etc. if you  
are so

inclined.

Thanks,
Grant





--
Regards,
Shalin Shekhar Mangar.

[jira] Updated: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field


 [ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-610:


Fix Version/s: 1.3

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field


 [ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-610.
-

Resolution: Fixed

commited.  Thanks lars!

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values


 [ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-556.
-

Resolution: Fixed

committed as part of SOLR-610.  thanks Lars!

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-highlight-multivalued.patch, 
 solr-highlight-multivalued-example.xml


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field


 [ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-610:
---

Assignee: Mike Klaas

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field

2008-06-27 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608878#action_12608878
 ] 

Mike Klaas commented on SOLR-610:
-

Hi Lars,

I was planning on commiting SOLR-556.  Would you rather I commit that first, or 
to produce a unified patch instead?

-Mike

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: per-field similarity

2008-06-25 Thread Mike Klaas


On 24-Jun-08, at 1:28 PM, Yonik Seeley wrote:


Something to consider for Lucene 3 is to have something to retrieve
Similarity per-field rather than passing the field name into some
functions...


+1

I've felt that this was the proper (and more useful) way to do  
things for a long time


(http://markmail.org/message/56bk6wrbwallyjvr)

-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-1314) IndexReader.reopen(boolean force)

2008-06-23 Thread Mike Klaas


On 23-Jun-08, at 10:14 AM, Jason Rutherglen (JIRA) wrote:


Does anyone know how to turn off Eclipse automatically changing the  
import statements?  I am not making it reformat but if I edit some  
code in a file it sees fit to reformat the imports.


http://www.google.com/search?q=turn%20off%20eclipse%20changing%20import%20statements


I'm running into a problem where Organize Imports is removing all of  
my import statements. I had to turn off Keep Imports Organized  
because I noticed that ...



-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: XSS in Solr admin interface

2008-06-20 Thread Mike Klaas



On 19-Jun-08, at 11:17 PM, Nicob wrote:


Le jeudi 19 juin 2008 à 19:21 -0700, Mike Klaas a écrit :


Fixed in r669766.


I checked the patch and it's correctly patching this XSS.
Thanks to the dev team !


Thanks for the report!

-Mike

Re: XSS in Solr admin interface

2008-06-19 Thread Mike Klaas



On 19-Jun-08, at 5:47 PM, Yonik Seeley wrote:


On Thu, Jun 19, 2008 at 7:42 PM, Nicob [EMAIL PROTECTED] wrote:
while testing the Solr search engine, I found a XSS vulnerability  
in its
administration interface. I wrote to [EMAIL PROTECTED], but I  
wonder
if this list could be a better place to find a security contact of  
the

Solr project.


This is definitely the right list.
Is this vulnerability in the current dev version of solr?


Fixed in r669766.

-Mike

[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

2008-06-16 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605403#action_12605403
 ] 

Mike Klaas commented on SOLR-14:


Note that it is very easy to use an external TokenFilter, so you could just cp 
WDF into your own class and make the changes.

(Though I'm not saying that this _shouldn't_ make it in for 1.3)

 Add the ability to preserve the original term when using WordDelimiterFilter
 

 Key: SOLR-14
 URL: https://issues.apache.org/jira/browse/SOLR-14
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Richard Trey Hyde
 Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, 
 WordDelimiterFilter.patch


 When doing prefix searching, you need to hang on to the original term 
 othewise you'll miss many matches you should be making.
 Data: ABC-12345
 WordDelimiterFitler may change this into
 ABC 12345 ABC12345
 A user may enter a search such as 
  ABC\-123*
 Which will fail to find a match given the above scenario.
 The attached patch will allow the use of the preserveOriginal option to 
 WordDelimiterFilter and will analyse as
 ABC 12345 ABC12345  ABC-12345 
 in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

2008-06-16 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605410#action_12605410
 ] 

Mike Klaas commented on SOLR-14:


Also, voting for an issue is a good way to increase its visibility

 Add the ability to preserve the original term when using WordDelimiterFilter
 

 Key: SOLR-14
 URL: https://issues.apache.org/jira/browse/SOLR-14
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Richard Trey Hyde
 Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, 
 WordDelimiterFilter.patch


 When doing prefix searching, you need to hang on to the original term 
 othewise you'll miss many matches you should be making.
 Data: ABC-12345
 WordDelimiterFitler may change this into
 ABC 12345 ABC12345
 A user may enter a search such as 
  ABC\-123*
 Which will fail to find a match given the above scenario.
 The attached patch will allow the use of the preserveOriginal option to 
 WordDelimiterFilter and will analyse as
 ABC 12345 ABC12345  ABC-12345 
 in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-06-10 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603780#action_12603780
 ] 

Mike Klaas commented on SOLR-556:
-

Thanks for the patch, Lars.  I think that the basic approach is sound, though I 
am a little nervous about the performance implications (especially in the case 
of phrase highlighting, where we spin up an entirely new spanhighlighter for 
each value in a multi-valued field).  I wonder if I am the only one who 
highlights large text fields composed of dozens of individual values?




 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-highlight-multivalued.patch, 
 solr-highlight-multivalued-example.xml


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-06-10 Thread Mike Klaas (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603785#action_12603785
]

Mike Klaas commented on SOLR-556:
-

Hey Lars,

Yeah, I'm talking about highlighting 15kB of text in 100-200 character chunks.
Maybe I can whip up a perf test for this soon.

The reason we probably see this issue differently is that the incorrect
behaviour is quite minor for most users (perhaps a bit of punctuation leaking
from value to value at most). Once way to correct what you are seeing is to
use a tokenizer that creates tokens out of the CJK characters, or things on
boundaries. In your case, inserting a fake token when encountering a right
bracket [)] would fix the problem, I think.

Nevertheless, I think I will probably end up committing your patch after
pondering it some more.

Highlighting of multi-valued fields returns snippets which span multiple
different values
-

Key: SOLR-556
URL: https://issues.apache.org/jira/browse/SOLR-556
Project: Solr
Issue Type: Bug
Components: highlighter
Affects Versions: 1.3
Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
Fix For: 1.3

Attachments: SOLR-556-highlight-multivalued.patch,
solr-highlight-multivalued-example.xml

When highlighting multi-valued fields, the highlighter sometimes returns
snippets which span multiple values, e.g. with values foo and bar and
search term ba the highlighter will create the snippet fooemba/emr.
Furthermore it sometimes returns smaller snippets than it should, e.g. with
value foobar and search term oo it will create the snippet emoo/em
regardless of hl.fragsize.
I have been unable to determine the real cause for this, or indeed what
actually goes on at all. To reproduce the problem, I've used the following
steps:
* create an index with multi-valued fields, one document should have at least
3 values for these fields (in my case strings of length between 5 and 15
Japanese characters -- as far as I can tell plain old ASCII should produce
the same effect though)
* search for part of a value in such a field with highlighting enabled, the
additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true,
hl.mergeContiguous=true (changing the parameters does not seem to have any
effect on the result though)
* highlighted snippets should show effects described above

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Solr Maven Artifacts

2008-06-09 Thread Mike Klaas

As someone who is completely ignorant (and admittedly, somewhat  
willfully so) of the java enterprise world, I was hoping that someone  
more savvy in the ways of maven would step in here.  It is even  
unclear to me what having the project in a Maven repository means for  
people, or why it would be convenient.


Based on the link you sent, it seems that a few things are necessary  
for this to proceed, like a maven project descriptor for Solr (or is  
that already done?).


That said, I'm +1 on steps to better propagate Solr, even if I don't  
think that I am the best person to effectuate those steps.


-Mike


On 9-Jun-08, at 12:58 AM, Andrew Savory wrote:


Hi,

Would any of the solr devs care to comment? It would be extremely  
useful to
have maven artifacts published for those building apps based on Solr  
1.2,
and it would help prepare the way for releasing Solr 1.3 maven  
artifacts.



2008/6/5 Andrew Savory [EMAIL PROTECTED]:


Hi,

2008/6/4 Andrew Savory [EMAIL PROTECTED]:


I see from http://issues.apache.org/jira/browse/SOLR-19 that some
tentative work has been done on mavenisation of solr, and from
https://issues.apache.org/jira/browse/SOLR-586 that discussion of
publishing maven artifacts ... is it possible to push solr 1.2 maven
artifacts out to the repo?



More specifically, would someone with sufficient privileges  
(Yonik?) be

willing to do the following (from [1]):

mkdir -p org.apache.solr/jars

grab the solr-1.2 release (or svn co tags/release-1.2.0, but then  
you need

to edit build.xml to update the version string that seems to have
accidentally been updated before doing release tag, to change  
property

name=version value=1.2.1-dev /)

tar xzvf apache-solr-1.2.0.tar.gz

cp apache-solr-1.2.0/dist/apache-solr-1.2.0.jar org.apache.solr/jars/

cd into org.apache.solr/jars and create md5 and sha1 checksums of
apache-solr-1.2.0.jar:
openssl md5  apache-solr-1.2.0.jar  apache-solr-1.2.0.jar.md5
openssl sha  apache-solr-1.2.0.jar  apache-solr-1.2.0.jar.sha1

sign the release:
gpg --armor --output apache-solr-1.2.0.jar.asc --detach-sig
apache-solr-1.2.0.jar

cd ../ and scp it onto people.apache.org:
scp -r org.apache.solr [EMAIL PROTECTED]:/www/
people.apache.org/repo/m1-ibiblio-rsync-repository/

check permissions:
cd /www/people.apache.org/repo/m1-ibiblio-rsync-repository/ 
org.apache.solr

chgrp -R apcvs *
chmod -R g+w *


I could do it but I suspect that would be overstepping the bounds  
of a

non-committer :-)

This will make it easier for anyone to use solr from within maven.  
I'll
file a patch to automate whatever can be automated from our ant  
build so

this is easier for the 1.3 release.

If people agree that publishing maven artifacts is a good idea, I'll
happily update http://wiki.apache.org/solr/HowToRelease to point to  
the

relevant information too.


[1] http://www.apache.org/dev/release-publishing.html#maven-repo




Andrew.
--
[EMAIL PROTECTED] / [EMAIL PROTECTED]
http://www.andrewsavory.com/

[jira] Commented: (SOLR-536) Automatic binding of results to Beans (for solrj)


[ 
https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602744#action_12602744
 ] 

Mike Klaas commented on SOLR-536:
-

 This is expensive
 private final MapClass, ListDocField infocache = 
Collections.synchronizedMap( new HashMapClass, ListDocField() );

 Let us make it
 private final MapClass, ListDocField infocache = 
new ConcurrentHashMapClass, ListDocField() ;

Expensive?  I'd expect the synchronizedMap to be faster and more memory 
compact.  The ConcurrentHashMap is definitely more concurrent, though.



 Automatic binding of results to Beans (for solrj)
 -

 Key: SOLR-536
 URL: https://issues.apache.org/jira/browse/SOLR-536
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-536.patch, SOLR-536.patch, SOLR-536.patch


 as we are using java5 .we can use annotations to bind SolrDocument to java 
 beans directly.
 This can make the usage of solrj a  bit simpler
 The QueryResponse class in solrj can have an extra method as follows
 public T ListT getResultBeans(ClassT klass)
 and the bean can have annotations as
 class MyBean{
 @Field(id) //name is optional
 String id;
 @Field(category)
 ListString categories
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component


[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602828#action_12602828
 ] 

Mike Klaas commented on SOLR-572:
-

[quote]Another use case is where Solr is used with indices that are not indices 
for a narrow domain or that don't have nice, clean, short fields that can be 
used for populating the SC index. For example, if the index consists of a pile 
of web pages, I don't think I'd want to use their data (not even their titles) 
to populate the SC index. I'd really want just a plain dictionary-powered 
SCRH.[/quote]

It works great, actually.  That was you get all the abbreviations, jargon, 
proper names, etc.   Thresholding help prevent most of the cruft from appearing 
in the index.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-284) Parsing Rich Document Types


 [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-284:


Affects Version/s: (was: 1.3)

Removing from 1.3.  No committer has taken ownership.

(It might make sense as a contrib, but I can see the argument for not 
duplicating tika)

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
 Fix For: 1.3

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, source.zip, test-files.zip, test-files.zip, test.zip


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-435) QParser must validate existance/absense of q parameter


 [ 
https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-435:


Fix Version/s: (was: 1.3)

 QParser must validate existance/absense of q parameter
 

 Key: SOLR-435
 URL: https://issues.apache.org/jira/browse/SOLR-435
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley

 Each QParser should check if q exists or not.  For some it will be required 
 others not.
 currently it throws a null pointer:
 {code}
 java.lang.NullPointerException
   at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36)
   at 
 org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)
   at org.apache.solr.search.QParser.getQuery(QParser.java:80)
   at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67)
   at 
 org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150)
 ...
 {code}
 see:
 http://www.nabble.com/query-parsing-error-to14124285.html#a14140108

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-433) MultiCore and SpellChecker replication

[
https://issues.apache.org/jira/browse/SOLR-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Klaas updated SOLR-433:

Fix Version/s: (was: 1.3)

MultiCore and SpellChecker replication
--

Key: SOLR-433
URL: https://issues.apache.org/jira/browse/SOLR-433
Project: Solr
Issue Type: Improvement
Components: replication, spellchecker
Affects Versions: 1.3
Reporter: Otis Gospodnetic
Attachments: RunExecutableListener.patch, solr-433.patch,
spellindexfix.patch

With MultiCore functionality coming along, it looks like we'll need to be
able to:
A) snapshot each core's index directory, and
B) replicate any and all cores' complete data directories, not just their
index directories.
Pulled from the spellchecker and multi-core index replication thread -
http://markmail.org/message/pj2rjzegifd6zm7m
Otis:
I think that makes sense - distribute everything for a given core, not just
its index. And the spellchecker could then also have its data dir (and only
index/ underneath really) and be replicated in the same fashion.
Right?
Ryan:
Yes, that was my thought. If an arbitrary directory could be distributed,
then you could have
/path/to/dist/index/...
/path/to/dist/spelling-index/...
/path/to/dist/foo
and that would all get put into a snapshot. This would also let you put
multiple cores within a single distribution:
/path/to/dist/core0/index/...
/path/to/dist/core0/spelling-index/...
/path/to/dist/core0/foo
/path/to/dist/core1/index/...
/path/to/dist/core1/spelling-index/...
/path/to/dist/core1/foo

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-351) external value source


 [ 
https://issues.apache.org/jira/browse/SOLR-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-351:


Fix Version/s: (was: 1.3)

 external value source
 -

 Key: SOLR-351
 URL: https://issues.apache.org/jira/browse/SOLR-351
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Attachments: ExternalFileField.patch


 Need a way to rapidly do a bulk update of a single field for use as a 
 component in a function query (no need to be able to search on it).
 Idea: create an ExternalValueSource fieldType that reads it's values from a 
 file.  The file could be simple id,val records, and stored in the index 
 directory so it would get replicated.  
 Values could optionally be updated more often than the searcher 
 (hashCode/equals should take this into account to prevent caching issues).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-284) Parsing Rich Document Types


 [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-284:


Fix Version/s: (was: 1.3)

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, source.zip, test-files.zip, test-files.zip, test.zip


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-484) Solr Website changes


 [ 
https://issues.apache.org/jira/browse/SOLR-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-484:


Fix Version/s: (was: 1.3)

 Solr Website changes
 

 Key: SOLR-484
 URL: https://issues.apache.org/jira/browse/SOLR-484
 Project: Solr
  Issue Type: Bug
  Components: documentation
Reporter: Grant Ingersoll
Priority: Minor

 In looking at the Solr website it has many of the same issues that Lucene 
 Java did when it comes to ASF policies about nightly builds, etc. concerning 
 the Javadocs  
 See 
 http://lucene.markmail.org/message/a7k7kujxkhwjwfy6?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22page=1
 and 
 http://lucene.markmail.org/message/vaks6omed4l6buth?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22page=1
 This would suggest a change like Hadoop and Lucene Java did to separate out 
 the main site, release docs (javadocs, any other?) and developer resources.  
 Currently the javadocs on the main page are the nightly and should be made 
 less prominent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-84) New Solr logo?


 [ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-84:
---

Fix Version/s: (was: 1.3)

 New Solr logo?
 --

 Key: SOLR-84
 URL: https://issues.apache.org/jira/browse/SOLR-84
 Project: Solr
  Issue Type: Improvement
Reporter: Bertrand Delacretaz
Priority: Minor
 Attachments: logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, 
 logo-solr-source-files-take2.zip, solr-84-source-files.zip, solr-f.jpg, 
 solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG, 
 solr-nick.gif, solr.jpg, sslogo-solr-flare.jpg, sslogo-solr.jpg, 
 sslogo-solr2-flare.jpg, sslogo-solr2.jpg, sslogo-solr3.jpg


 Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) 
 sarraux-dessous.ch) has reworked his logo proposal to be more solar.
 This can either be the start of a logo contest, or if people like it we could 
 adopt it. The gradients can make it a bit hard to integrate, not sure if this 
 is really a problem.
 WDYT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-410) Audit the new ResponseBuilder class


[ 
https://issues.apache.org/jira/browse/SOLR-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602834#action_12602834
 ] 

Mike Klaas commented on SOLR-410:
-

Ryan, can this be closed?

 Audit the new ResponseBuilder class
 ---

 Key: SOLR-410
 URL: https://issues.apache.org/jira/browse/SOLR-410
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
 Fix For: 1.3


 In SOLR-281, we added a ResponseBuilder class to help search components 
 communicate with one another.  Before releasing 1.3, we need to make sure 
 this is the best design and that it is an interface we can support in the 
 future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders


 [ 
https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-243:



Do we still want to target 1.3 here?  (Seems like there is a lot to do before 
it is commit-worthy, based on the comments)

 Create a hook to allow custom code to create custom IndexReaders
 

 Key: SOLR-243
 URL: https://issues.apache.org/jira/browse/SOLR-243
 Project: Solr
  Issue Type: Improvement
  Components: search
 Environment: Solr core
Reporter: John Wang
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch


 I have a customized IndexReader and I want to write a Solr plugin to use my 
 derived IndexReader implementation. Currently IndexReader instantiation is 
 hard coded to be: 
 IndexReader.open(path)
 It would be really useful if this is done thru a plugable factory that can be 
 configured, e.g. IndexReaderFactory
 interface IndexReaderFactory{
  IndexReader newReader(String name,String path);
 }
 the default implementation would just return: IndexReader.open(path)
 And in the newSearcher and getSearcher methods in SolrCore class can call the 
 current factory implementation to get the IndexReader instance and then build 
 the SolrIndexSearcher by passing in the reader.
 It would be really nice to add this improvement soon (This seems to be a 
 trivial addition) as our project really depends on this.
 Thanks
 -John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-545) remove MultiCore default core / cleanup DispatchHandlera


 [ 
https://issues.apache.org/jira/browse/SOLR-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-545:
---

Assignee: Ryan McKinley

assigning 1.3 multicore stuff to Ryan

 remove MultiCore default core / cleanup DispatchHandlera 
 ---

 Key: SOLR-545
 URL: https://issues.apache.org/jira/browse/SOLR-545
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3


 MultiCore should require a core name in the URL.  If the core name is 
 missing, there should be a 404, not a valid core.  That is:
 http://localhost:8983/solr/select?q=*:*  should return 404.
 While we are at it, we should cleanup the DispatchHandler.  Perhaps the best 
 approach is to treat single core as multicore with only one core?  As is the 
 tangle of potential paths is ugly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-489) Added @deprecation Javadoc comments


 [ 
https://issues.apache.org/jira/browse/SOLR-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-489:
---

Assignee: Mike Klaas

 Added @deprecation Javadoc comments
 ---

 Key: SOLR-489
 URL: https://issues.apache.org/jira/browse/SOLR-489
 Project: Solr
  Issue Type: Bug
  Components: documentation
Reporter: Sean Timm
Assignee: Mike Klaas
Priority: Trivial
 Fix For: 1.3

 Attachments: deprecationDocumentation.patch


 In a number of files, @Deprecation annotations were added without 
 accompanying @deprecation Javadoc comments to explain what to use now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (SOLR-344) New Java API

[
https://issues.apache.org/jira/browse/SOLR-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Klaas closed SOLR-344.
---

Resolution: Invalid

Let's move this discussion to the wiki and mailinglist. It isn't really an
open issue for Solr.

New Java API

Key: SOLR-344
URL: https://issues.apache.org/jira/browse/SOLR-344
Project: Solr
Issue Type: Improvement
Components: clients - java, search, update
Affects Versions: 1.3
Reporter: Jonathan Woods
Attachments: New Java API for Solr.pdf

The core Solr codebase urgently needs to expose a new Java API designed for
use by Java running in Solr's JVM and ultimately by core Solr code itself.
This API must be (i) object-oriented ('typesafe'), (ii) self-documenting,
(iii) at the right level of granularity, (iv) designed specifically to expose
the value which Solr adds over and above Lucene.
This is an urgent issue for two reasons:
- Java-Solr integrations represent a use-case which is nearly as important as
the core Solr use-case in which non-Java clients interact with Solr over HTTP
- a significant proportion of questions on the mailing lists are clearly from
people who are attempting such integrations right now.
This point in Solr development - some way out from the 1.3 release - might be
the right time to do the development and refactoring necessary to produce
this API. We can do this without breaking any backward compatibility from
the point of view of XML/HTTP and JSON-like clients, and without altering the
core Solr algorithms which make it so efficient. If we do this work now, we
can significantly speed up the spread of Solr.
Eventually, this API should be part of core Solr code, not hived off into
some separate project nor in a non-first-class package space. It should be
capable of forming the foundation of any new Solr development which doesn't
need to delve into low level constructs like DocSet and so on - and any new
development which does need to do just that should be a candidate for
incorporation into the API at the some level. Whether or not it will ever be
worth re-writing existing code is a matter of opinion; but the Java API
should be such that if it had existed before core plug-ins were written, it
would have been natural to use it when writing them.
I've attached a PDF which makes the case for this API. Apologies for
delivering it as an attachment, but I wanted to embed pics and a bit of
formatting.
I'll update this issue in the next few days to give a prototype of this API
to suggest what it might look like at present. This will build on the work
already done in Solrj and SearchComponents
(https://issues.apache.org/jira/browse/SOLR-281), and will be a patch on an
up-to-date revision of Solr trunk.
[PS:
1. Having written most of this, I then properly looked at
SearchComponents/SOLR-281 and read
http://www.nabble.com/forum/ViewPost.jtp?post=11050274framed=y, which says
much the same thing albeit more quickly! And weeks ago, too. But this
proposal is angled slightly differently:
- it focusses on the value of creating an API not only for internal Solr
consumption, but for local Java clients
- it focusses on designing a Java API without constantly being hobbled by
HTTP-Java
- it's suggesting that the SearchComponents work should result in a Java API
which can be used as much by third party Java as by ResponseBuilder.
2. I've made some attempt to address Hoss's point
(http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#6551097579454875774)
- that an API like this would need to maintain enough state e.g. to allow an
initial search to later be faceted, highlighted etc without going back to the
start each time - but clearly the proof of the pudding will be in the
prototype.
3. Again, I've just discovered SOLR-212 (DirectSolrConnection). I think all
my comments about Solrj apply to this, useful though it clearly is.]

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-200) Scripts don't work when run as root in ~root and su'ing to a user


 [ 
https://issues.apache.org/jira/browse/SOLR-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-200.
-

Resolution: Won't Fix

It doesn't surprise me that /root as the indexdir and / as solr_home 
doesn't work, being root or not.  I don't think that this is an important case.

 Scripts don't work when run as root in ~root and su'ing to a user
 -

 Key: SOLR-200
 URL: https://issues.apache.org/jira/browse/SOLR-200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Jürgen Hermann
Priority: Minor

 This patch avoids an error due to permission problems when orig_dir is /root
 -orig_dir=$(pwd)
 -cd ${0%/*}/..
 -solr_root=$(pwd)
 -cd ${orig_dir}
 +solr_root=$(cd ${0%/*}/..  pwd)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-517) highlighter doesn't work with hl.requireFieldMatch=true on un-optimized index


[ 
https://issues.apache.org/jira/browse/SOLR-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602857#action_12602857
 ] 

Mike Klaas commented on SOLR-517:
-

Koji:  Is this resolved?  I seemed to recall that we brought this up on 
java-dev, but I can't find the thread at the moment.

(I don't think that the right thing to do is remove idf fetching of the terms 
as your patch proposes)

 highlighter doesn't work with hl.requireFieldMatch=true on un-optimized index
 -

 Key: SOLR-517
 URL: https://issues.apache.org/jira/browse/SOLR-517
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.2, 1.3
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-517.patch, SOLR-517.patch


 On un-optimized index, highlighter doesn't work with 
 hl.requireFieldMatch=true.
 see:
 http://www.nabble.com/hl.requireFieldMatch-and-idf-td16324482.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-522) analysis.jsp doesn't show payloads created/modified by tokenizers and tokenfilters


 [ 
https://issues.apache.org/jira/browse/SOLR-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-522:


Fix Version/s: 1.3

 analysis.jsp doesn't show payloads created/modified by tokenizers and 
 tokenfilters
 --

 Key: SOLR-522
 URL: https://issues.apache.org/jira/browse/SOLR-522
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Tricia Williams
Assignee: Mike Klaas
Priority: Trivial
 Fix For: 1.3

 Attachments: SOLR-522-analysis.jsp.patch, SOLR-522-analysis.jsp.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 Add payload content to the vebose output of the analysis.jsp page for 
 debugging purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders


[ 
https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602860#action_12602860
 ] 

Mike Klaas commented on SOLR-243:
-

Hi John,

Hoss has marked the issue for 1.3, so it will be in the release.

-Mike

 Create a hook to allow custom code to create custom IndexReaders
 

 Key: SOLR-243
 URL: https://issues.apache.org/jira/browse/SOLR-243
 Project: Solr
  Issue Type: Improvement
  Components: search
 Environment: Solr core
Reporter: John Wang
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch


 I have a customized IndexReader and I want to write a Solr plugin to use my 
 derived IndexReader implementation. Currently IndexReader instantiation is 
 hard coded to be: 
 IndexReader.open(path)
 It would be really useful if this is done thru a plugable factory that can be 
 configured, e.g. IndexReaderFactory
 interface IndexReaderFactory{
  IndexReader newReader(String name,String path);
 }
 the default implementation would just return: IndexReader.open(path)
 And in the newSearcher and getSearcher methods in SolrCore class can call the 
 current factory implementation to get the IndexReader instance and then build 
 the SolrIndexSearcher by passing in the reader.
 It would be really nice to add this improvement soon (This seems to be a 
 trivial addition) as our project really depends on this.
 Thanks
 -John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [important] call for 1.3 planning

2008-06-05 Thread Mike Klaas



On 21-May-08, at 4:45 PM, Mike Klaas wrote:

There seems to be some sort of consensus building that there should  
be a 1.3 release in the near future.  The first step is to figure  
out what we want to finish before it gets released.


The list of JIRA issues currently labeled 1.3 can be found here:

http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12312486 



Let's try to get an assignee for every issue in that list by a week  
from now.  If nobody steps up for an issue in that time, I'll assume  
it is low enough priority to move post-1.3.  This would also be a  
good time to add any issues that you want to champion for 1.3.


That brings us down to 20 issues, with only 2 unassigned: SOLR-424 and  
SOLR-410.  I removed a few of the feature issues with no assignee.


Seems like the big things that need to get done are:
  - componented spellchecking
  - contrib area + data import handler
  - distributed search

-Mike

Re: 3 TokenFilter factories not compatible with 1.2

2008-06-04 Thread Mike Klaas


On 4-Jun-08, at 5:24 PM, Yonik Seeley wrote:


On Wed, Jun 4, 2008 at 7:03 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:

3) Documentation and Education
Since this wasn't exactly a use case we ever advertised, we could  
punt on
the problem by putting a disclaimer in the CAHNGES.txt that ayone  
directly

constructing those 3 classes should explicitly call inform() on the
instances after calling init.


#3 is obviously the simplest approach as developers, and to be  
quite honest:
probably impacts the fewest total number of people (since there are  
probably

very few people constructing Factory instances themselves)


+1


+1, perhaps also pinging -user to see if there is a sizable group of  
people doing this.


-Mike

[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-06-04 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602541#action_12602541
 ] 

Mike Klaas commented on SOLR-556:
-

Ah, I see what the problem is:  Although it is impossible for tokens from 
different values to appear in the same fragment (due to the semantics of 
MultiValuedTokenFilter), the non-token text (typically, punctuation) from 
different values can bleed into the same fragment, since lucene's highlighter 
can only create a new fragment on token boundaries.

Unfortunately SOLR-553 was committed a day after you submitted your patch, and 
rearranges the code slightly so that it no longer applies.  Could you sync the 
patch with trunk?  I think the basic approach is sound.

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: solr-highlight-multivalued-example.xml, 
 solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-161) Dangling dash causes stack trace

2008-06-03 Thread Mike Klaas (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602038#action_12602038
]

Mike Klaas commented on SOLR-161:
-

It is really a Lucene query parser bug, but it wouldn't hurt to do s/(.*)-//
as a workaround. Assuming my ed(1) syntax is still fresh. Regardless, no
query string should ever give a stack trace

This might be hard to guarantee. Already there are four issues details
specific ways that dismax that barf on input. A lot of the suggestions above
are of the form of detecting a specific failure mode and correcting it, which
does not guarantee that you will catch them all.

A robust way to do it is parse the query into an AST using a grammar in a way
that matches the query as well as possible (dropping the stuff that doesn't
fit). Unfortunately, this is duplicative of the lucene parsing logic, and it
would be nicer add a relaxed mode to lucene rather than pre-parsing the query.

(The reparse+reassemble method is what we use, btw. It is written in python
but it might be possible to translate to java.)

Dangling dash causes stack trace

Key: SOLR-161
URL: https://issues.apache.org/jira/browse/SOLR-161
Project: Solr
Issue Type: Bug
Components: search
Affects Versions: 1.1.0
Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
Reporter: Walter Underwood

I'm running tests from our search logs, and we have a query that ends in a
dash. That caused a stack trace.
org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the
truth -': Encountered EOF at line 1, column 23.
Was expecting one of:
( ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...

at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
at
org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1293) Tweaks to PhraseQuery.explain()

2008-05-29 Thread Mike Klaas (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600973#action_12600973
 ] 

Mike Klaas commented on LUCENE-1293:


It is meant for debugging, though I have found it so painfully slow in the past 
that I have avoided it on occasion.

The main culprit is the looped next() call in PhraseScorer.explain().  Using 
skipTo() would be faster.

 Tweaks to PhraseQuery.explain()
 ---

 Key: LUCENE-1293
 URL: https://issues.apache.org/jira/browse/LUCENE-1293
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
Reporter: Itamar Syn-Hershko
Priority: Minor
 Fix For: 2.4


 The explain() function in PhraseQuery.java is very clumzy and could use many 
 optimizations. Perhaps it is only because it is intended to use while 
 debugging?
 Here's an example:
 {noformat}
   result.addDetail(fieldExpl);
   // combine them
   result.setValue(queryExpl.getValue() * fieldExpl.getValue());
   if (queryExpl.getValue() == 1.0f)
 return fieldExpl;
   return result;
}
 {noformat}
 Can easily be tweaked and become:
 {noformat}
   if (queryExpl.getValue() == 1.0f) {
 return fieldExpl;
   }
   result.addDetail(fieldExpl);
   // combine them
   result.setValue(queryExpl.getValue() * fieldExpl.getValue());
   return result;
   }
 {noformat}
 And thats really just for a start...
 Itamar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Release of SOLR 1.3

2008-05-23 Thread Mike Klaas



On 20-May-08, at 12:32 PM, Shalin Shekhar Mangar wrote:


+1 for your suggestions Mike.

I'd like to see a few of the smaller issues get committed in 1.3  
such as
SOLR-256 (JMX), SOLR-536 (binding for SolrJ), SOLR-430 (SpellChecker  
support
in SolrJ) etc. Also, SOLR-561 (replication by Solr) would be really  
cool to
have in the next release. Noble and I are working on it and plan to  
give a

patch soon.


Whether something makes it in to this release will depend mostly on  
getting the buy-in and time commitment from one of the committers  
familiar with that aspect of the project.  There is so much in 1.3 as  
it is that I think our focus should be on getting it out sooner rather  
than adding things.  But small things that significantly improve the  
release are good too.


SOLR-561 seems like a rather large project to me (although I have  
never even used the existing collection distribution method).


Mike -- you removed SOLR-563 (Contrib area for Solr) from 1.3 but it  
is a

dependency for SOLR-469 (DataImportHandler) as it was decided to have
DataImportHandler as a contrib project. It would also be good to  
have a
rough release roadmaps to work against. Can fixed release cycle (say  
every 6

months) work for Solr?


Twice-yearly releases would be nice to aim for, but I think we're too  
small a project to fix release dates in advance.


-Mike

Re: Release of SOLR 1.3

2008-05-22 Thread Mike Klaas



On 22-May-08, at 12:13 AM, Andrew Savory wrote:


Sure, Commit-Then-Review vs. Review-Then-Commit ... but I don't
actually think RTC is going to ensure significantly more widespread
review since the time burden on other developers to find the issue in
JIRA, download the patch, apply the patch, test, respond, then revert
the change. Do people really have the time to do that?  It's
significantly more effort than that to svn update, look at code, and
feed back. I prefer detailed discussion on the mailing list (which
supports decent threading, quoting etc, unlike JIRA) followed by
commit of a trial implementation which can then be refactored.
Otherwise there might be a tendency to analysis paralysis. But I'm the
new boy here, so I'll STFU and try to help out on the release instead
of forcing y'all to rehash old discussions on how to run an open
source project ;-) Maybe by the time 1.3 is out the door we'll all be
using distributed SCM systems and the discussion will be moot anyway!


I think we agree in principle--a patch does not have to be spotless to  
be committed.  I also agree that that mailinglist is a preferable  
place to hash out design details.  But it is necessary that the basic  
approach is one we feel will stick with before getting committed.  I  
don't think this imposes much of a burden on people aiming to review a  
patch.


It is true that using patches takes an extra minute or two to set up,  
but the time to evaluate a contribution is _by far_ mostly contained  
in understanding the contribution, its implications, and examining the  
code.  Plus, the patch is much easier to back out of a given  
repository and makes it easier to see exactly what changes were made.   
Since contributors can't commit to the repository anyway, I don't see  
much disadvantage in working with patches.


(btw, if you want a one-line equivalent to svn up, try something like:

$ wget http://issues.apache.org/jira/secure/attachment/12381498/SOLR-563.patch 
 -O - | patch -p0


Reverting is also one line:
$ svn revert -R .

Although this leaves added files, which can be removed with
$ svn st | grep '?' | awk '{print $2}' | xargs rm

Another useful trick is to have multiple checkouts of trunk and  
bounce an active changeset from one to another with

$ svn diff | (cd ../otherbranch; patch -p0)
)

-Mike

-Mike

[important] call for 1.3 planning

2008-05-21 Thread Mike Klaas

There seems to be some sort of consensus building that there should be  
a 1.3 release in the near future.  The first step is to figure out  
what we want to finish before it gets released.


The list of JIRA issues currently labeled 1.3 can be found here:

http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12312486 



Let's try to get an assignee for every issue in that list by a week  
from now.  If nobody steps up for an issue in that time, I'll assume  
it is low enough priority to move post-1.3.  This would also be a good  
time to add any issues that you want to champion for 1.3.


(This isn't meant to be a final list, just something to help get us  
started.  Most of the unassigned issues were reported by committers,  
so that should hopefully make it easy to figure out the assignee.)


-Mike

[jira] Updated: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values


 [ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-556:


Fix Version/s: 1.3

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: solr-highlight-multivalued-example.xml, 
 solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-536) Automatic binding of results to Beans (for solrj)


 [ 
https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-536:


Fix Version/s: (was: 1.3)

 Automatic binding of results to Beans (for solrj)
 -

 Key: SOLR-536
 URL: https://issues.apache.org/jira/browse/SOLR-536
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Priority: Minor
 Attachments: SOLR-536.patch


 as we are using java5 .we can use annotations to bind SolrDocument to java 
 beans directly.
 This can make the usage of solrj a  bit simpler
 The QueryResponse class in solrj can have an extra method as follows
 public T ListT getResultBeans(ClassT klass)
 and the bean can have annotations as
 class MyBean{
 @Field(id) //name is optional
 String id;
 @Field(category)
 ListString categories
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-579) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit


 [ 
https://issues.apache.org/jira/browse/SOLR-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-579:


Fix Version/s: (was: 1.3)

 Extend SimplePost with RecurseDirectories, threads, document encoding , 
 number of docs per commit
 -

 Key: SOLR-579
 URL: https://issues.apache.org/jira/browse/SOLR-579
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
 Environment: Applies to all platforms
Reporter: Patrick Debois
Priority: Minor
   Original Estimate: 72h
  Remaining Estimate: 72h

 -When specifying a directory, simplepost should read also the contents of a  
 directory
 New options for the commandline (some only usefull in DATAMODE= files)
 -RECURSEDIRS
 Recursive read of directories as an option, this is usefull for 
 directories with a lot of files where the commandline expansion fails and 
 xargs is too slow
 -DOCENCODING (default = system encoding or UTF-8) 
 For non utf-8 clients , simplepost should include a way to set the 
 encoding of the documents posted
 -THREADSIZE (default =1 ) 
 For large volume posts, a threading pool makes sense , using JDK 1.5 
 Threadpool model
 -DOCSPERCOMMIT (default = 1)
 Number of documents after which a commit is done, instead of only at 
 the end
 Note: not to break the existing behaviour of the existing SimplePost tool 
 (post.sh) might be used in scripts 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-383) Add support for globalization/culture management


 [ 
https://issues.apache.org/jira/browse/SOLR-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-383.
-

   Resolution: Fixed
Fix Version/s: (was: 1.3)

 Add support for globalization/culture management
 

 Key: SOLR-383
 URL: https://issues.apache.org/jira/browse/SOLR-383
 Project: Solr
  Issue Type: Improvement
  Components: clients - C#
Affects Versions: 1.3
Reporter: Jeff Rodenburg
Assignee: Jeff Rodenburg
Priority: Minor

 SolrSharp should supply configuration and/or programmatic control over 
 windows culture settings.  This is important for working with data being 
 saved to indexes that carry certain formatting expectations for various types 
 of fields, both in SolrSharp as well as the solr field counterparts on the 
 server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-563) Contrib area for Solr


 [ 
https://issues.apache.org/jira/browse/SOLR-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-563:


Fix Version/s: (was: 1.3)

 Contrib area for Solr
 -

 Key: SOLR-563
 URL: https://issues.apache.org/jira/browse/SOLR-563
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-563.patch


 Add a contrib area for Solr and modify existing build.xml to build, package 
 and distribute contrib projects also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-565) Component to abstract shards from clients


 [ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-565:


Fix Version/s: (was: 1.3)

 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-551) SOlr replication should include the schema also