Google-developed posting list encoding

2010-04-14 Thread Mike Klaas
Can be quite a bit faster than vInt in some cases:
http://www.ir.uwaterloo.ca/book/addenda-06-index-compression.html

-Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Created: (SOLR-1363) Search without using caches

2009-08-14 Thread Mike Klaas
Keep in mind that there is no way to bypass the most important cache of all
(os disk cache).
-Mike

On Thu, Aug 13, 2009 at 12:01 PM, Jason Rutherglen (JIRA)
j...@apache.orgwrote:


 For testing, I often need to perform a query and see the actual time it
 takes (rather than the time it takes to look it up from the cache).  We'll
 need various options such as bypass the docsets, docs, or results.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




Re: [jira] Updated: (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit

2009-05-29 Thread Mike Klaas
I'd like to take a look at this but JIRA seems to be down. Is anyone else
experiencing this?

-Mike


On Wed, May 13, 2009 at 7:41 AM, Jayson Minard (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Jayson Minard updated SOLR-1155:
 

 Attachment: Solr-1155.patch

 Resolve TODO for commitWithin, and updated AutoCommitTrackerTest to
 validate the fix.

  Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
  -
 
  Key: SOLR-1155
  URL: https://issues.apache.org/jira/browse/SOLR-1155
  Project: Solr
   Issue Type: Improvement
   Components: search
 Affects Versions: 1.3
 Reporter: Jayson Minard
  Attachments: Solr-1155.patch, Solr-1155.patch
 
 
  Currently DirectUpdateHandler2 will block adds during a commit, and it
 seems to be possible with recent changes to Lucene to allow them to run
 concurrently.
  See:
 http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




[jira] Commented: (SOLR-1169) SortedIntDocSet

2009-05-14 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709645#action_12709645
 ] 

Mike Klaas commented on SOLR-1169:
--

sweet.  intersecting sorted int dicts should be faster in the general case.  
HashSet will of course win when one set is very small, but I expect this to 
still be pretty fast anyway.

 SortedIntDocSet
 ---

 Key: SOLR-1169
 URL: https://issues.apache.org/jira/browse/SOLR-1169
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 1.4


 A DocSet type that can skip to support SOLR-1165

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: DirectUpdateHandler2 threads pile up behind scheduleCommitWithin

2009-05-11 Thread Mike Klaas

Hi Jayson,

Thanks, I'll take a look in the next few days.  The current patch  
doesn't guarantee index consistency during post-commit callback hooks,  
right?  This could be a problem for index replication.  (Incidentally,  
I'm rather unfamiliar with the new java-based replication design.   
Anyone care to comment on the implications?)


cheers,
-MIke

On 10-May-09, at 10:54 AM, jayson.minard wrote:



Mike,

I revamped the DirectUpdateHandler2 into DirectUpdateHandler3 in  
SOLR-1155,
probably ready enough for your review to see if locking makes sense  
for

current Lucene behavior.

https://issues.apache.org/jira/browse/SOLR-1155

--j


Mike Klaas wrote:


On 7-May-09, at 10:36 AM, jayson.minard wrote:



Does every thread really need to notify the update handler of the
commit
interval/threshold being reached, or really just the first thread  
that

notices should send the signal, or better yet a background commit
watching
thread so that no foreground thread has to pay attention at all.
That is
assuming they wouldn't need to block like they are now for a reason
I'm
likely unaware of...


This is due to the way Lucene was designed (although recent
improvements in Lucene mean we can do better here).  See the recent
thread Autocommit blocking adds? on solr-user for a related
discussion.

As the person who first wrote the multi-threaded-ness of DUH2, I'd be
very happy to promptly review any improvements made to it.

-Mike




--
View this message in context: 
http://www.nabble.com/DirectUpdateHandler2-threads-pile-up-behind-scheduleCommitWithin-tp23431691p23472391.html
Sent from the Solr - Dev mailing list archive at Nabble.com.





Re: DirectUpdateHandler2 threads pile up behind scheduleCommitWithin

2009-05-08 Thread Mike Klaas

On 7-May-09, at 10:36 AM, jayson.minard wrote:



Does every thread really need to notify the update handler of the  
commit

interval/threshold being reached, or really just the first thread that
notices should send the signal, or better yet a background commit  
watching
thread so that no foreground thread has to pay attention at all.   
That is
assuming they wouldn't need to block like they are now for a reason  
I'm

likely unaware of...


This is due to the way Lucene was designed (although recent  
improvements in Lucene mean we can do better here).  See the recent  
thread Autocommit blocking adds? on solr-user for a related  
discussion.


As the person who first wrote the multi-threaded-ness of DUH2, I'd be  
very happy to promptly review any improvements made to it.


-Mike


Re: Welcome new Solr committers Mark Miller and Noble Paul

2009-04-30 Thread Mike Klaas

On 30-Apr-09, at 10:41 AM, Yonik Seeley wrote:


I'm pleased to announce that Mark Miller and Noble Paul have accepted
invitations to become Solr committers!
Welcome Mark  Noble, and thanks for all your great work on Solr!


Congratulations Mark and Noble!   Good to have you on board.

-Mike


[jira] Commented: (SOLR-1116) Add a Binary FieldType

2009-04-29 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704284#action_12704284
 ] 

Mike Klaas commented on SOLR-1116:
--

+1 for url-safe base64 (-_ being the extra chars)

 Add a Binary FieldType
 --

 Key: SOLR-1116
 URL: https://issues.apache.org/jira/browse/SOLR-1116
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1116.patch, SOLR-1116.patch


 Lucene supports binary data for field but Solr has no corresponding field 
 type. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Modularization

2009-03-23 Thread Mike Klaas


On 23-Mar-09, at 2:41 PM, Michael McCandless wrote:


I agree, but at least we need some clear criteria so the future
decision process is more straightforward.  Towards that... it seems
like there are good reasons why something should be put into contrib:

 * It uses a version of JDK higher than what core can allow

 * It has external dependencies

 * Its quality is debatable (or at least not proven)

 * It's of somewhat narrow usage/interest (eg: contrib/bdb)

But I don't think it doesn't have to be in core (the software
modularity goal) is the right reason to put something in contrib.


Agreed.  I don't think that building on the existing 'contrib' is the  
way to go.  Frequently-used, high-quality components should be more  
properly part of Lucene, whether that means that they move to core,  
or in a new blessed modules section.



Getting back to the original topic: Trie(Numeric)RangeFilter runs on
JDK 1.4, has no external dependencies, looks to be high quality, and
likely will have wide appeal.  Doesn't it belong in core?


+1.  It is important that Lucene come blessed with very good quality  
defaults.  Fast range queries are a common requirement.  Similarly, I  
wouldn't be happy to have a new, wicked QueryParser be relegated to  
contrib where it is unlikely to be found by non-savvy users.  At the  
very least, I agree with Michael that it should be findable in the  
same place.


It does make sense to separate the machinery/building blocks (base  
Query, Weight, Scorer, Filter classes, Similarity interface, etc.)  
from the Query/Filter implementations that use them.  But whether this  
is done by putting them in separate directories or via global core/ 
modules distinction seems unimportant.


-Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-03-23 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688449#action_12688449
 ] 

Mike Klaas commented on LUCENE-1561:


I agree that it is going to be almost impossible to convey that phrase queries 
don't work by renaming the flag.  I agree with Eks Dev that a positive 
formulation is the only chance, although this deviates from the current omit* 
flags.

termPresenceOnly()
trackTermPresenceOnly()
onlyTermPresence()
omitEverythingButTermPresence() // just kidding


 Maybe rename Field.omitTf, and strengthen the javadocs
 --

 Key: LUCENE-1561
 URL: https://issues.apache.org/jira/browse/LUCENE-1561
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: LUCENE-1561.patch


 Spinoff from here:
   
 http://www.nabble.com/search-problem-when-indexed-using-Field.setOmitTf()-td22456141.html
 Maybe rename omitTf to something like omitTermPositions, and make it clear 
 what queries will silently fail to work as a result.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting tokens from search results. Simple concept

2009-03-06 Thread Mike Klaas

On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote:



: What I would LOVE is if I could do it in a standard Lucene search  
like I

: mentioned earlier.
: Hit.doc[0].getHitTokenList() :confused:
: Something like this...

The Query/Scorer APIs don't provide any mechanism for information like
that to be conveyed back up the call chain -- mainly because it's more
heavy weight then most people need.

If you have custom Query/Scorer implementations, you can keep track of
whatever state you want when executing a QUery -- in fact the  
SpanQuery
family of queries do keep track of exactly the type of info you seem  
to
want, and after executing a query, you can ask it for the Spans of  
any
matching document -- the down side is the a loss in performance of  
query
execution (because it takes time/memory to keep track of all the  
matches)


Even then, if I'm not mistaken, spans track token _positions_, not  
_offsets_ in the original string.


A reverse text index like lucene is fast precisely because it doesn't  
have to keep track of this information.  I think the best alternative  
might be to use termvectors, which are essentially a cache of the  
analyzed tokens for a document.


-Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679466#action_12679466
 ] 

Mike Klaas commented on SOLR-1044:
--

{quote} I haven't yet seen a HTTP server serving more than around 1200 req/sec 
(apache HTTPD). A call based server can serve 4k-5k  messages  easily. (I am 
yet to test hadoop RPC) . The proliferation of a large no: of frameworks around 
that is a testimony to the superiority of that approach. {/quote}

up to 50,000 req/sec, with keepalive: 
http://www.litespeedtech.com/web-server-performance-comparison-litespeed-2.0-vs.html

 Use Hadoop RPC for inter Solr communication
 ---

 Key: SOLR-1044
 URL: https://issues.apache.org/jira/browse/SOLR-1044
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Noble Paul

 Solr uses http for distributed search . We can make it a whole lot faster if 
 we use an RPC mechanism which is more lightweight/efficient. 
 Hadoop RPC looks like a good candidate for this.  
 The implementation should just have one protocol. It should follow the Solr's 
 idiom of making remote calls . A uri + params +[optional stream(s)] . The 
 response can be a stream of bytes.
 To make this work we must make the SolrServer implementation pluggable in 
 distributed search. Users should be able to choose between the current 
 CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-952) duplicated code in (Default)SolrHighlighter and HighlightingUtils

2009-02-18 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674830#action_12674830
 ] 

Mike Klaas commented on SOLR-952:
-

HighlightingUtils has been deprecated for at least one release; can't we just 
rip it out?

 duplicated code in (Default)SolrHighlighter and HighlightingUtils
 -

 Key: SOLR-952
 URL: https://issues.apache.org/jira/browse/SOLR-952
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Chris Harris
Priority: Minor
 Attachments: SOLR-952.patch


 A large quantity of code is duplicated between the deprecated 
 HighlightingUtils class and the newer SolrHighlighter and 
 DefaultSolrHighlighter (which have been getting bug fixes and enhancements). 
 The Utils class is no longer used anywhere in Solr, but people writing 
 plugins may be taking advantage of it, so it should be cleaned up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] LOGO

2008-12-19 Thread Mike Klaas


On 17-Dec-08, at 5:11 PM, Ryan McKinley wrote:


Hoss - can you go ahead and post something?   I'm heading out... but  
could post tomorrow.



Since the community has been notified, any objections to me updating  
the site with the new logo/favicon?


-Mike


Re: [jira] Commented: (SOLR-912) org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList

2008-12-19 Thread Mike Klaas


On 19-Dec-08, at 8:27 AM, Kay Kay (JIRA) wrote:


Meanwhile - w.r.t resize() - ( trade-off because increasing size a  
lot would increase memory usage.  increase a size by a smaller  
factor would be resulting in a more frequent increases in size). I  
believe reading some theory that the ideal increase factor is  
somewhere close to  ( 1 + 2^0.5) / 2  or something similar to that.


It should be benchmarked, but yes, a factor of two is typically more  
memory wasteful than the performance it gains (you have a 50% chance  
of wasting at least 1/4 of your memory, a 25% chance of wasting at  
least 3/8th, etc.)


The method - ensureCapacity(capacity) in ArrayList (Java 6) also  
seems to be a number along the lines ~ (1.5)


int newCapacity = (oldCapacity * 3)/2 + 1;

+1 seems to be move away from 0, and keep incrementing the count.  
( Hmm .. That piece of code - in Java 6 ArrayList can definitely  
make use of bitwise operators for the div-by-2 operation !!).


Let's not go crazy here guys.  This relatively trivial calculation is  
only called log(n) times, and certainly uses bit ops after the jit  
gets its hands on it.


-Mike


Re: [VOTE] LOGO

2008-12-16 Thread Mike Klaas


On 13-Dec-08, at 2:52 PM, Ryan McKinley wrote:


Ok, all votes are cast (except Grant who is abstaining)


Thanks for tallying the votes, Ryan.  You're too damn quick for me!

-Mike



Re: [VOTE] LOGO

2008-12-12 Thread Mike Klaas

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png


Re: [VOTE] LOGO

2008-12-11 Thread Mike Klaas
I agree.  I don't see why there needs to be  a minimum or maximum  
number of logos to rank per vote.


-Mike

On 10-Dec-08, at 7:52 PM, Yonik Seeley wrote:


Doesn't limiting to top 4 defeat the purpose of using STV to overcome
splitting-the-vote?
Seems like we should rank the whole list (or all that an individual
finds acceptable)

-Yonik

On Wed, Dec 10, 2008 at 8:51 PM, Ryan McKinley ryan...@gmail.com  
wrote:
This thread is for solr committers to list the top 4 logos  
preferences from

the community logo contest.  As a guide, we should look at:
http://people.apache.org/~ryan/solr-logo-results.html

The winner will be tabulated using instant runoff voting -- if this
happens to result in a tie, the winner will be picked by the 'Single
transferable vote'
http://en.wikipedia.org/wiki/Instant-runoff_voting
http://en.wikipedia.org/wiki/Single_transferable_vote

To cast a valid vote, you *must* include 4 options.

ryan




Re: logo contest

2008-12-10 Thread Mike Klaas


On 8-Dec-08, at 10:47 AM, Mike Klaas wrote:


On 7-Dec-08, at 7:40 PM, Chris Hostetter wrote:



: I would personally prefer more of an elimination-style vote  
(i.e., STV).


Ah... yeah, that seems like it would be a more fair way to deal with
things then my suggestion, and it doesn't violate the spirt of the
rules as original outlined (it's still a vote of ranked  
preferences).  Are

you volunteering to do the vote counting Mike?


Sure thing.


I take it that there are no objections?  If so, I'll call a vote by  
the end of the week.


cheers,
-Mike


Re: logo contest

2008-12-10 Thread Mike Klaas


On 10-Dec-08, at 12:41 PM, Yonik Seeley wrote:


Sure thing.


I take it that there are no objections?  If so, I'll call a vote by  
the end

of the week.


+1
I just wish we had used this method with the community vote.

I guess as a committer I should try and figure out what order the
community would have voted and do that.


I could run the results of the community vote interpreted as STV, if  
that would help (it'll be a few days, though).


-Mike


Re: logo contest

2008-12-08 Thread Mike Klaas

On 7-Dec-08, at 7:40 PM, Chris Hostetter wrote:



: I would personally prefer more of an elimination-style vote  
(i.e., STV).


Ah... yeah, that seems like it would be a more fair way to deal with
things then my suggestion, and it doesn't violate the spirt of the
rules as original outlined (it's still a vote of ranked  
preferences).  Are

you volunteering to do the vote counting Mike?


Sure thing.

-Mike


Re: logo contest

2008-12-04 Thread Mike Klaas

On 4-Dec-08, at 2:33 PM, Chris Hostetter wrote:



: Being the likely two candidates for winning.  My guess is that
: narrowing to the two most popular options first would make #2 the
: winner, while voting on the top 10 (w/o any strategy for winning)
: would make #1 the winner.

limiting to only voting for the top 2 seems unrepresentative since  
more

then one apache_solr_c_red.jpg variant tied for 2nd.

: fun, fun.  So people who want one of these options to win should  
vote

: only for that option, really.

Perhaps instead of just ranking top 5, we should ask committers to
rank all of the choices on the final ballot to eliminate the
strategy factor you are refering to ... i think we can trust all
committers to understand this, but if someone botches it (or refuses?)
we'll just shift the number of points each item earns down by the
appropraite number (so if you want your 1st rank to earn 10
points, you must list all 10, if you only list 4 then your top  
ranked item

only earns 4 points)


Eliminating strategic voting merely biases the outcome toward the logo  
without the vote splitting problem.  That is no solution.
It is better to allow strategic voting, as that is the only way for  
voters to express certain preferences in this system.


I would personally prefer more of an elimination-style vote (i.e.,  
STV).  Each voter lists the logos they prefer, in order.  The logos  
are ranked by first place votes.  The last in the rank is eliminated  
from the contest, and anyone who had that logo as their first-place  
vote has their vote transferred to the next logo on the list, if any.   
Iterate until two logos remain.  There is no danger of vote-splitting  
and the outcome maximizes global welfare in terms of binary  
preferences (well, probably not, due to Arrow's theorem, but it does a  
good job regardless).


-Mike


Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-20 Thread Mike Klaas


On 19-Nov-08, at 5:12 AM, Michael McCandless (JIRA) wrote:


How can the VM system possibly make good decisions about what to swap
out?  It can't know if a page is being used for terms dict index,
terms dict, norms, stored fields, postings.  LRU is not a good policy,
because some pages (terms index) are far far more costly to miss than
others.


A note on this discussion: we recently re-architected a large database- 
y, lucene-y system to use mmap-based storage and are extremely pleased  
with the performance.   Sharing the buffers among processes is rather  
cool, as Marvin mentions, as is the near-instantaneous startup.


-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Optimizing range constraints

2008-11-20 Thread Mike Klaas
Tim Sturge posted a nice optimization for range constraints/filters  
(e.g. age:[10 TO 35]) here:


https://issues.apache.org/jira/browse/LUCENE-1461

It has a natural applicability to Solr's fq range filters, which can  
be abysmally slow for large ranges.  Could be an interesting project  
for contributors who love optimizing speed (100-fold, in this case)  
g.  I'd definitely do it had I the time.


-Mike


Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Mike Klaas


On 18-Nov-08, at 8:54 AM, Mark Miller wrote:


Mark Miller wrote:

Toby Cole wrote:
Has anyone else experienced a deadlock when the  
DirectUpdateHandler2 does an autocommit?
I'm using a recent snapshot from hudson (apache- 
solr-2008-11-12_08-06-21), and quite often when I'm loading data  
the server (tomcat 6) gets stuck at line 469 of  
DirectUpdateHandler2:


 // Check if there is a commit already scheduled for longer  
then this time

 if( pending != null 
 pending.getDelay(TimeUnit.MILLISECONDS) = commitMaxTime )

Anyone got any enlightening tips?



There is some inconsistent synchronization I think. Especially  
involving pending. Yuck g
I would say there are problems with pending, autoCommitCount, and  
lastAddedTime. That alone could probably cause a deadlock (who  
knows), but it also seems somewhat possible that there is an issue  
with the heavy intermingling of locks (there a bunch of locks to be  
had in that class). I havn't looked for evidence of that though -  
prob makes sense to fix those 3 guys and see if you get reports from  
there.



autoCommitCount is written in a CommitTracker.synchronized block  
only.  It is read to print stats in an unsynchronized fashion, which  
perhaps could be fixed, though I can't see how it could cause a problem


lastAddedTime is only written in a call path within a  
DirectUpdateHandler2.synchronized block.  It is only read in a  
CommitTracker.synchronized block.  It could read the wrong value, but  
I also don't see this causing a problem (a commit might fail to be  
scheduled).  This could probably also be improved, but doesn't seem  
important.


pending seems to be the issue.  As long as commit are only triggered  
by autocommit, there is no issue as manipulation of pending is always  
performed inside CommitTracker.synchronized.  But didCommit()/ 
didRollback() could be called via manual commit, and pending is  
directly manipulated during DUH2.close().  I'm having trouble coming  
up with a plausible deadlock scenario, but this needs to be fixed.  It  
isn't as easy as synchronizing didCommit/didRollback, though--this  
would introduce definite deadlock scenarios.


Mark, is there any chance you could post the thread dump for the  
deadlocked process?  Do you issue manual commits during insertion?


-Mike


Re: [jira] Commented: (SOLR-84) Logo Contests

2008-11-14 Thread Mike Klaas


On 14-Nov-08, at 8:54 AM, Doug Cutting (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647660 
#action_12647660 ]


Doug Cutting commented on SOLR-84:
--

I like https://issues.apache.org/jira/secure/attachment/12349896/logo-solr-e.jpg 
 and https://issues.apache.org/jira/secure/attachment/12358494/sslogo-solr.jpg 
, because they're simple and scale down well.  It should be possible  
to scale the logo, or a salient part of it, as small as a favicon  
(16x16) and still have it easily recognized.  Most of the designs  
above require a lot of pixels to be recognizable.  A good logo  
should be iconic more than textual--an abstract symbol.


Often you can sample an element of a logo to form a favicon (like we  
do with Lucene's 'L').  So, when voting, think about whether there's  
an easily identifiable sample (e.g., is the typeface of the 'S'  
distinctive?).


Lots of the designs do have distinctive suns that would make good  
favicons (after re-vectorizing; those gradients would not rescale  
nicely).


-Mike


Re: ReentrantReadWriteLock in DUH2

2008-11-06 Thread Mike Klaas


On 6-Nov-08, at 7:48 AM, Koji Sekiguchi wrote:

 So that multiple threads can efficiently access the writer, but  
only one thread at a time does a commit.
 Adding docs with the writer is the 'read' and committing is the  
write. If I remember correctly.


You remember correctly, Mark. Because of the lock, add/ is blocked
during optimize/, even if ConcurrentMergeScheduler is used, right?
I'd like to know why add/ should be blocked during optimize/.


The core reason is laid out in the comment:

  // open a new searcher in the sync block to avoid opening it
  // after a deleteByQuery changed the index, or in between deletes
  // and adds of another commit being done.

We want to open a searcher than corresponds exactly to the commit  
point (remember, an optimize is first and foremost a commit).


I don't see why there couldn't be an optimize command that doesn't  
commit, if that is desired.


-Mike


[jira] Commented: (SOLR-793) set a commit time bounds in the add command

2008-10-12 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638825#action_12638825
 ] 

Mike Klaas commented on SOLR-793:
-

I don't see any issue with the code: adddedDocument is always called within a 
synchronized context anyway, after all.

One question: right now you have it set to use the minimum of 
autocommit/maxTime and commitWithin on the update command.  Might it be better 
to always use commitWithin, even if it greater than a specified maxTime?   This 
would allow the insertion of less important than normal docs (right now, it 
seems only useful for the more important case)

 set a commit time bounds in the add command
 -

 Key: SOLR-793
 URL: https://issues.apache.org/jira/browse/SOLR-793
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-793-commitWithin.patch, SOLR-793-commitWithin.patch


 Currently there are two options for how to handle commiting documents:
 1. the client explicitly starts the commit via commit/
 2. set an auto commit value on the server -- clients can assume all documents 
 will be commited within that time.
 However, this does not help in the case where the clients know what documents 
 need updating quickly and others that could wait.  I suggest adding:
 {code:xml}
  add commitWithin=100...
 {/code:xml} 
 to the update syntax so the client can schedule commits explicitly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-793) set a commit time bounds in the add command

2008-10-06 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12637173#action_12637173
 ] 

Mike Klaas commented on SOLR-793:
-

Hey Ryan,

I think this is good functionality and will take a look at the synchro stuff in 
the next day or so.   I feel somewhat reponsible, being the one who inflicted 
it on everyone :)

 set a commit time bounds in the add command
 -

 Key: SOLR-793
 URL: https://issues.apache.org/jira/browse/SOLR-793
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-793-commitWithin.patch, SOLR-793-commitWithin.patch


 Currently there are two options for how to handle commiting documents:
 1. the client explicitly starts the commit via commit/
 2. set an auto commit value on the server -- clients can assume all documents 
 will be commited within that time.
 However, this does not help in the case where the clients know what documents 
 need updating quickly and others that could wait.  I suggest adding:
 {code:xml}
  add commitWithin=100...
 {/code:xml} 
 to the update syntax so the client can schedule commits explicitly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Setting Fix Version in JIRA

2008-09-23 Thread Mike Klaas

On 23-Sep-08, at 12:33 PM, Otis Gospodnetic wrote:


Hi,

When people add new issues to JIRA they most often don't set the  
Fix Version field.  Would it not be better to have a default value  
for that field, so that new entries don't get forgotten when we  
filter by Fix Version looking for issues to fix for the next  
release?  If every issue had Fix Version set we'd be able to  
schedule things better, give reporters and others more insight into  
when a particular item will be taken care of, etc.  When we are  
ready for the release we'd just bump all unresolved issues to the  
next planned version (e.g. Solr 1.3.1 or 1.4 or Lucene 2.4 or 2.9)


-1  I doesn't make sense to automatically schedule something to be  
fixed in the next version of the product.


I would be +1 on automatically setting the fix version for the current  
unreleased version when an issue is resolved as fixed, though.


-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Setting Fix Version in JIRA

2008-09-23 Thread Mike Klaas

On 23-Sep-08, at 12:33 PM, Otis Gospodnetic wrote:


Hi,

When people add new issues to JIRA they most often don't set the  
Fix Version field.  Would it not be better to have a default value  
for that field, so that new entries don't get forgotten when we  
filter by Fix Version looking for issues to fix for the next  
release?  If every issue had Fix Version set we'd be able to  
schedule things better, give reporters and others more insight into  
when a particular item will be taken care of, etc.  When we are  
ready for the release we'd just bump all unresolved issues to the  
next planned version (e.g. Solr 1.3.1 or 1.4 or Lucene 2.4 or 2.9)


-1  I doesn't make sense to automatically schedule something to be  
fixed in the next version of the product.


I would be +1 on automatically setting the fix version for the current  
unreleased version when an issue is resolved as fixed, though.


-Mike


Re: Solr 1.3.0 Release Lessons Learned

2008-09-22 Thread Mike Klaas


On 22-Sep-08, at 10:34 AM, Shalin Shekhar Mangar wrote:


I'd like to propose a more pro-active approach to release planning  
by the
community. At any given time, let's have two versions in JIRA. Only  
those
issues which a committer has assigned to himself should be in the  
first

un-released version. All unassigned issues must be kept in the second
un-released version. If a committer assigns and promotes an issue to  
the
first un-released version, he should feel confident enough to  
resolve the
issue one way or another within 3 months of the last release else he  
should
mark it for the second version. At any given time, anybody can call  
a vote
on releasing with the trunk features. If we feel confident enough  
and the
list of resolved issues substantial enough, we can work according to  
our
current way of release planning (deferring open issues, creating a  
branch,

prioritizing bugs, putting up an RC and then release).


I think that this is the right approach, but I don't think that it  
needs to be that complicated.  For issues without the expectation of  
completion that you mention, it is fine to just not assign a version  
to the issue.  It _would_ be useful, OTOH, to have a 2.0 version in  
JIRA for issues we know won't be resolved back-compatibly.


-Mike


[jira] Commented: (SOLR-216) Improvements to solr.py

2008-09-10 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12629981#action_12629981
 ] 

Mike Klaas commented on SOLR-216:
-

That's great!  Be sure to update http://wiki.apache.org/solr/SolPython as the 
project progresses.



 Improvements to solr.py
 ---

 Key: SOLR-216
 URL: https://issues.apache.org/jira/browse/SOLR-216
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
Affects Versions: 1.2
Reporter: Jason Cater
Assignee: Mike Klaas
Priority: Trivial
 Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, 
 solr.py, test_all.py


 I've taken the original solr.py code and extended it to include higher-level 
 functions.
   * Requires python 2.3+
   * Supports SSL (https://) schema
   * Conforms (mostly) to PEP 8 -- the Python Style Guide
   * Provides a high-level results object with implicit data type conversion
   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-766) Remove python client from 1.3 distribution

2008-09-10 Thread Mike Klaas (JIRA)
Remove python client from 1.3 distribution
--

 Key: SOLR-766
 URL: https://issues.apache.org/jira/browse/SOLR-766
 Project: Solr
  Issue Type: Task
  Components: clients - python
Affects Versions: 1.3
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Blocker
 Fix For: 1.3


see solr-dev thread:

http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL 
PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-766) Remove python client from 1.3 distribution

2008-09-10 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12630004#action_12630004
 ] 

Mike Klaas commented on SOLR-766:
-

JIRA seems to be not allowing me to upload a patch.  Here is the text of the 
proposed README:

Note: As of version 1.3, Solr no longer comes bundled with a Python client.  
The existing client
was not sufficiently maintained or tested as development of Solr progressed, 
and committers
felt that the code was not up to our usual high standards of release.

The client bundled with previous versions of Solr will continue to be available 
indefinitely at:
http://svn.apache.org/viewvc/lucene/solr/tags/release-1.2.0/client/python/

Please see http://wiki.apache.org/solr/SolPython for information on third-party 
Solr python
clients.



 Remove python client from 1.3 distribution
 --

 Key: SOLR-766
 URL: https://issues.apache.org/jira/browse/SOLR-766
 Project: Solr
  Issue Type: Task
  Components: clients - python
Affects Versions: 1.3
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Blocker
 Fix For: 1.3


 see solr-dev thread:
 http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-766) Remove python client from 1.3 distribution

2008-09-10 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-766:


Attachment: SOLR-766.patch

 Remove python client from 1.3 distribution
 --

 Key: SOLR-766
 URL: https://issues.apache.org/jira/browse/SOLR-766
 Project: Solr
  Issue Type: Task
  Components: clients - python
Affects Versions: 1.3
Reporter: Mike Klaas
Assignee: Mike Klaas
Priority: Blocker
 Fix For: 1.3

 Attachments: SOLR-766.patch


 see solr-dev thread:
 http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL 
 PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr's use of Lucene's Compression field

2008-09-03 Thread Mike Klaas
Agreed.  It was the simplest thing to do at the time, but it would  
definitely be preferrable to offer the much faster lesser levels of  
compression.


-Mike

On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote:

Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception 
, it occurred to me that we probably should refactor Solr's offering  
of compression.  Currently, we rely on Field.COMPRESS from Lucene,  
but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 
, because it only offers the highest level of compression, which is  
also the slowest.


Obviously, Solr needs to handle the compression on the server side.   
I think we should have Solr do the compression, allowing users to  
set the level of compression (maybe even make it pluggable to put in  
your own compression techniques) and then just use Lucene's binary  
field capability.  Granted, this is lower priority since I doubt  
many people use compression to begin with, but, still it would be  
useful.


-Grant




Re: Solr's use of Lucene's Compression field

2008-09-03 Thread Mike Klaas
Also I see that another Lucene bug (LUCENE-1374) was found relating to  
compressed fields in lucene (when we first added compressed field  
support to solr a lucene bug involving lazy-loaded fields and  
compression was uncovered, too).


It would be good to change the implementation simply to avoid relying  
on a deprecated lucene feature that isn't well exercised in development.


-Mike

On 3-Sep-08, at 11:36 AM, Mike Klaas wrote:

Agreed.  It was the simplest thing to do at the time, but it would  
definitely be preferrable to offer the much faster lesser levels of  
compression.


-Mike

On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote:

Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception 
, it occurred to me that we probably should refactor Solr's  
offering of compression.  Currently, we rely on Field.COMPRESS from  
Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 
, because it only offers the highest level of compression, which is  
also the slowest.


Obviously, Solr needs to handle the compression on the server  
side.  I think we should have Solr do the compression, allowing  
users to set the level of compression (maybe even make it pluggable  
to put in your own compression techniques) and then just use  
Lucene's binary field capability.  Granted, this is lower priority  
since I doubt many people use compression to begin with, but, still  
it would be useful.


-Grant






[jira] Commented: (SOLR-739) Add support for OmitTf

2008-08-29 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627049#action_12627049
 ] 

Mike Klaas commented on SOLR-739:
-

Haven't looked at the patch, but defaulting to omitTf=true is 
backwards-incompatible (think multi-valued string fields)

 Add support for OmitTf
 --

 Key: SOLR-739
 URL: https://issues.apache.org/jira/browse/SOLR-739
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-739.patch


 Allow setting omitTf in the field schema. Default to true for all but text 
 fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: 1.3 status

2008-08-25 Thread Mike Klaas

+1 for 1.3 RC.

The idea of putting new issues in 1.3.1 has been tossed around a few  
times on this list in the last few weeks.   I'm not sure how other  
people feel about this, but in my mind, 1.X.Y and 1.X.Z releases  
should be feature-identical, with later releases only containing  
bugfixes.  If we have a bunch of cool features we want to release  
shortly, I'd be happy with releasing 1.4 quickly :)


-Mike

On 25-Aug-08, at 7:30 AM, Shalin Shekhar Mangar wrote:


+1 for Lucene upgrade
+1 for a release candidate.

I think the newer issues can make it to 1.3.1 easily. We don't need  
to halt

1.3 for them.

A general question -- how long does a Release Candidate phase lasts?

On Mon, Aug 25, 2008 at 7:51 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


+1 for Lucene upgrade
+1 for a release (I *think* none of the recent SOLR-7** issues have  
to go

in 1.3)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Erik Hatcher [EMAIL PROTECTED]
To: solr-dev@lucene.apache.org
Sent: Monday, August 25, 2008 10:06:46 AM
Subject: Re: 1.3 status


On Aug 25, 2008, at 9:48 AM, Yonik Seeley wrote:

Given that there are backward compat concerns with
https://issues.apache.org/jira/browse/LUCENE-1142
perhaps we should update Lucene again before a release?


+1

   Erik






--
Regards,
Shalin Shekhar Mangar.




Re: [jira] Closed: (LUCENE-1363) sub task of reopen performance

2008-08-22 Thread Mike Klaas

Wow, that was a fast resolution to this issue :)

-Mike

On 22-Aug-08, at 12:46 AM, F.Y. (JIRA) wrote:



[ https://issues.apache.org/jira/browse/LUCENE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
 ]


F.Y. closed LUCENE-1363.


   Resolution: Fixed


sub task of reopen performance
--

   Key: LUCENE-1363
   URL: https://issues.apache.org/jira/browse/LUCENE-1363
   Project: Lucene - Java
Issue Type: Sub-task
   Environment: win
  Reporter: F.Y.




--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (SOLR-474) audit docs for Spellchecker

2008-08-14 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622677#action_12622677
 ] 

Mike Klaas commented on SOLR-474:
-

The issue is more wikidocs vs. behaviour.  I apologize I haven't gotten to this 
yet--I've been suffering from RSI the last month or so and it has been 
difficult to get it non-work computer time.   I'll take a look today.

 audit docs for Spellchecker
 ---

 Key: SOLR-474
 URL: https://issues.apache.org/jira/browse/SOLR-474
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Mike Klaas
 Fix For: 1.3


 according to this troubling comment from Mike, the spellchecker handler 
 javadocs (and wiki) may not reflect reality...
 http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712
 {quote}
 Multi-word spell checking is available only with extendedResults=true, and 
 only in trunk.  I
 believe that the current javadocs are incorrect on this point.
 {quote}
 we should audit/fix this before 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-474) audit docs for Spellchecker

2008-08-14 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-474.
-

Resolution: Fixed

I've verified the behaviour and updated the wiki page accordingly.

 audit docs for Spellchecker
 ---

 Key: SOLR-474
 URL: https://issues.apache.org/jira/browse/SOLR-474
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Mike Klaas
 Fix For: 1.3


 according to this troubling comment from Mike, the spellchecker handler 
 javadocs (and wiki) may not reflect reality...
 http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712
 {quote}
 Multi-word spell checking is available only with extendedResults=true, and 
 only in trunk.  I
 believe that the current javadocs are incorrect on this point.
 {quote}
 we should audit/fix this before 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-216) Improvements to solr.py

2008-08-13 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622391#action_12622391
 ] 

Mike Klaas commented on SOLR-216:
-

Hi Dariusz,

There will almost certainly be no more releases of Solr 1.2.  1.3 will likely 
be released in less than a month.  However, it is good that you published this 
code so that it can be found by other parties.

I'd be much more interested in working toward a client that is compatible with 
the upcoming 1.3 release (it is unlikely that it can be included, but it can be 
distributed separately).

cheers,
-Mike

 Improvements to solr.py
 ---

 Key: SOLR-216
 URL: https://issues.apache.org/jira/browse/SOLR-216
 Project: Solr
  Issue Type: Improvement
  Components: clients - python
Affects Versions: 1.2
Reporter: Jason Cater
Assignee: Mike Klaas
Priority: Trivial
 Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, 
 solr.py, test_all.py


 I've taken the original solr.py code and extended it to include higher-level 
 functions.
   * Requires python 2.3+
   * Supports SSL (https://) schema
   * Conforms (mostly) to PEP 8 -- the Python Style Guide
   * Provides a high-level results object with implicit data type conversion
   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: ClientUtils escape query

2008-08-05 Thread Mike Klaas

Wouldn't you want to reverse all escaping in that case anyway?

-Mike

On 5-Aug-08, at 1:45 PM, Grant Ingersoll wrote:

It's mainly a problem when one wants to display the thing later, I  
guess.


-Grant

On Aug 5, 2008, at 4:16 PM, Ryan McKinley wrote:

That came after I spent a week increasing the list of things that  
need escaped one at a time (waiting for errors along the way...)


Erik suggested I look at how the ruby client handles it... and I  
haven't seen any problem since them.


Is there any problem with over escaping?  I know it makes some  
things look funny.  Perhaps there is a regex that will do any non- 
letter except


ryan


On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote:

ClientUtils.escapeQueryChars seems a bit aggressive to me in terms  
of what it escapes.  It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping 
 Special Characters, but doesn't explicitly escape them, instead  
opting for the more general \W regex.  Thus, I'm noticing that  
chars that don't need to be escaped ( like / ) are being escaped.


Anyone recall why this is?  I suppose the problem comes in when  
one considers other query parsers, but maybe we should just mark  
this one as explicitly for use w/ the Lucene QP?


-Grant









Re: AutoCommitTest

2008-08-05 Thread Mike Klaas

On 5-Aug-08, at 3:32 PM, Yonik Seeley wrote:


AutoCommitTest was failing for me a good percentage of the time...
the comment suggested that adding another doc after the commit
callback would block until the new searcher was registered.  But
that's not the case.  I've hacked the test for now to just sleep(500)
after the commit callback.


Fair enough.  It is difficult for me to fix this more permenently,  
since I can't get it to fail on local machines.


I deleted a bunch of email recently so I checked nabble--it seems that  
in the last month that AutoCommitTest has failed once in Hudson (July  
21) and once in the apache build (August 2).  That isn't too bad, but  
I hope that your change eliminates those entirely.


-Mike


Re: [jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-29 Thread Mike Klaas


On 29-Jul-08, at 3:20 AM, Andrew Savory wrote:


Actually I'd argue that all such technical discussion would be better
done on the mailing list rather than through JIRA. Mail clients are
designed for threaded discussions far better than JIRA's web GUI. And
JIRA's posting back to the list with bq. makes most responses
impossible to follow. Excessive use of JIRA feels like a community
antipattern to me.


+1

-Mike


[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-28 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617512#action_12617512
 ] 

Mike Klaas commented on SOLR-665:
-

I haven't looked at the proposed code at all, but it _is_ possible to design 
this kind of datastructure, with much care:

http://www.ddj.com/hpc-high-performance-computing/208801974


 FIFO Cache (Unsynchronized): 9x times performance boost
 ---

 Key: SOLR-665
 URL: https://issues.apache.org/jira/browse/SOLR-665
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
 Environment: JRockit R27 (Java 6)
Reporter: Fuad Efendi
 Attachments: FIFOCache.java

   Original Estimate: 672h
  Remaining Estimate: 672h

 Attached is modified version of LRUCache where 
 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
 reordering/true (performance bottleneck of LRU) is replaced to 
 insertion-order/false (so that it became FIFO)
 2. Almost all (absolutely unneccessary) synchronized statements commented out
 See discussion at 
 http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
 Performance metrics (taken from SOLR Admin):
 LRU
 Requests: 7638
 Average Time-Per-Request: 15300
 Average Request-per-Second: 0.06
 FIFO:
 Requests: 3355
 Average Time-Per-Request: 1610
 Average Request-per-Second: 0.11
 Performance increased 9 times which roughly corresponds to a number of CPU in 
 a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
 Current number of documents: 7494689
 name:  filterCache  
 class:org.apache.solr.search.LRUCache  
 version:  1.0  
 description:  LRU Cache(maxSize=1000, initialSize=1000)  
 stats:lookups : 15966954582
 hits : 16391851546
 hitratio : 0.102
 inserts : 4246120
 evictions : 0
 size : 2668705
 cumulative_lookups : 16415839763
 cumulative_hits : 16411608101
 cumulative_hitratio : 0.99
 cumulative_inserts : 4246246
 cumulative_evictions : 0 
 Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-28 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617549#action_12617549
 ] 

Mike Klaas commented on SOLR-665:
-

[quote]We may simply use java.util.concurrent.locks instead of heavy 
synchronized... we may also use Executor framework instead of single-thread 
faceting... We may even base SOLR on Apache MINA project.[/quote]

Simply replacing synchronized with java.util.concurrent.locks doesn't increase 
performance.  There needs to be a specific strategy for employing these locks 
in a way that makes sense.

For instance, one idea would be to create a read/write lock with the put()'s 
covered by write and get()'s covered by read.  This would allow multiple 
parallel reads and will be thread-safe.  Another is to create something like 
ConcurrentLinkedHashMap.

These strategies should be tested before trying to create a lock-free get() 
version, which if even possible, would rely deeply on the implementation (such 
a structure would have to be created from scratch, I believe).  I'd expect 
anyone that is able to create such a thing be familiar enough wiht memory 
barriers and such issues to be able to deeply explain the problems with 
double-checked locking off the top of their head (and immediately see such 
problems in other code)

 FIFO Cache (Unsynchronized): 9x times performance boost
 ---

 Key: SOLR-665
 URL: https://issues.apache.org/jira/browse/SOLR-665
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
 Environment: JRockit R27 (Java 6)
Reporter: Fuad Efendi
 Attachments: FIFOCache.java

   Original Estimate: 672h
  Remaining Estimate: 672h

 Attached is modified version of LRUCache where 
 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
 reordering/true (performance bottleneck of LRU) is replaced to 
 insertion-order/false (so that it became FIFO)
 2. Almost all (absolutely unneccessary) synchronized statements commented out
 See discussion at 
 http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
 Performance metrics (taken from SOLR Admin):
 LRU
 Requests: 7638
 Average Time-Per-Request: 15300
 Average Request-per-Second: 0.06
 FIFO:
 Requests: 3355
 Average Time-Per-Request: 1610
 Average Request-per-Second: 0.11
 Performance increased 9 times which roughly corresponds to a number of CPU in 
 a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
 Current number of documents: 7494689
 name:  filterCache  
 class:org.apache.solr.search.LRUCache  
 version:  1.0  
 description:  LRU Cache(maxSize=1000, initialSize=1000)  
 stats:lookups : 15966954582
 hits : 16391851546
 hitratio : 0.102
 inserts : 4246120
 evictions : 0
 size : 2668705
 cumulative_lookups : 16415839763
 cumulative_hits : 16411608101
 cumulative_hitratio : 0.99
 cumulative_inserts : 4246246
 cumulative_evictions : 0 
 Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-474) audit docs for Spellchecker

2008-07-28 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617580#action_12617580
 ] 

Mike Klaas commented on SOLR-474:
-

I will look at this before release.

 audit docs for Spellchecker
 ---

 Key: SOLR-474
 URL: https://issues.apache.org/jira/browse/SOLR-474
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Mike Klaas
 Fix For: 1.3


 according to this troubling comment from Mike, the spellchecker handler 
 javadocs (and wiki) may not reflect reality...
 http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712
 {quote}
 Multi-word spell checking is available only with extendedResults=true, and 
 only in trunk.  I
 believe that the current javadocs are incorrect on this point.
 {quote}
 we should audit/fix this before 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2008-07-24 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12616729#action_12616729
 ] 

Mike Klaas commented on SOLR-139:
-

[quote]David - storing all data in the search index can be a problem because it 
can get BIG. Imagine if nutch stored the raw content in the lucene index? (I 
may be wrong on this) even with Lazy loading, there is a query time cost to 
having stored fields.[/quote]

Splitting it out into another store is much better at scale.  A distinct lucene 
index works relatively well.



 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Defining properties/using expressions in {multicore, config, schema} files

2008-07-21 Thread Mike Klaas


On 21-Jul-08, at 10:48 AM, Henrib wrote:



I posted a new patch in solr-350 (solr-350-properties.patch) that  
allows
defining properties in multicore.xml and using them in expressions  
in config

 schema files. This brings a lot of flexibility to configuration.

I apologize for doubling the JIRA post; Solr-350 being closed, I  
just wanted
to ensure anyone interested in the feature could try/comment/review/ 
etc.


Perhaps opening a new issue would be best?

cheers,
-Mike


Re: Welcom Shalin Shekhar Mangar

2008-07-20 Thread Mike Klaas

Welcome aboard, Shalin!

-Mike

On 19-Jul-08, at 12:01 PM, Shalin Shekhar Mangar wrote:


Thanks!

I work at AOL in Bangalore as part of a small team which gets to  
work on a
variety of (very cool!) stuff. Though my involvement started when we  
decided
to contribute part of our work to Solr (DataImportHandler), it soon  
became a
personal passion and has remained so since. AOL continues to  
encourage and

support me for which I'm thankful.

I'm very happy to be a part of this community and I'm looking  
forward to

working more closely with you all.

On Sat, Jul 19, 2008 at 1:12 AM, Grant Ingersoll [EMAIL PROTECTED]
wrote:


I am pleased to announce that the Lucene PMC has named Shalin Shekhar
Mangar as a Solr committer.  Shalin has already contributed  
numerous patches

to the community as well as answers and help on the user list.

Shalin, tradition has it that new committers introduce themselves a  
little
bit, so feel free to drop a note about where you work, etc. if you  
are so

inclined.

Thanks,
Grant





--
Regards,
Shalin Shekhar Mangar.




[jira] Updated: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field

2008-07-07 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-610:


Fix Version/s: 1.3

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field

2008-07-07 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-610.
-

Resolution: Fixed

commited.  Thanks lars!

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-07-07 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-556.
-

Resolution: Fixed

committed as part of SOLR-610.  thanks Lars!

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-highlight-multivalued.patch, 
 solr-highlight-multivalued-example.xml


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field

2008-07-07 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-610:
---

Assignee: Mike Klaas

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field

2008-06-27 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608878#action_12608878
 ] 

Mike Klaas commented on SOLR-610:
-

Hi Lars,

I was planning on commiting SOLR-556.  Would you rather I commit that first, or 
to produce a unified patch instead?

-Mike

 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
 ---

 Key: SOLR-610
 URL: https://issues.apache.org/jira/browse/SOLR-610
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: SOLR-610-maxanalyzed.patch


 Add support for specifying negative values for the hl.maxAnalyzedChars 
 parameter to be able highlight the whole field without having to know its 
 size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: per-field similarity

2008-06-25 Thread Mike Klaas

On 24-Jun-08, at 1:28 PM, Yonik Seeley wrote:


Something to consider for Lucene 3 is to have something to retrieve
Similarity per-field rather than passing the field name into some
functions...


+1

I've felt that this was the proper (and more useful) way to do  
things for a long time


(http://markmail.org/message/56bk6wrbwallyjvr)

-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-1314) IndexReader.reopen(boolean force)

2008-06-23 Thread Mike Klaas

On 23-Jun-08, at 10:14 AM, Jason Rutherglen (JIRA) wrote:


Does anyone know how to turn off Eclipse automatically changing the  
import statements?  I am not making it reformat but if I edit some  
code in a file it sees fit to reformat the imports.


http://www.google.com/search?q=turn%20off%20eclipse%20changing%20import%20statements


I'm running into a problem where Organize Imports is removing all of  
my import statements. I had to turn off Keep Imports Organized  
because I noticed that ...



-Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: XSS in Solr admin interface

2008-06-20 Thread Mike Klaas


On 19-Jun-08, at 11:17 PM, Nicob wrote:


Le jeudi 19 juin 2008 à 19:21 -0700, Mike Klaas a écrit :


Fixed in r669766.


I checked the patch and it's correctly patching this XSS.
Thanks to the dev team !


Thanks for the report!

-Mike

Re: XSS in Solr admin interface

2008-06-19 Thread Mike Klaas


On 19-Jun-08, at 5:47 PM, Yonik Seeley wrote:


On Thu, Jun 19, 2008 at 7:42 PM, Nicob [EMAIL PROTECTED] wrote:
while testing the Solr search engine, I found a XSS vulnerability  
in its
administration interface. I wrote to [EMAIL PROTECTED], but I  
wonder
if this list could be a better place to find a security contact of  
the

Solr project.


This is definitely the right list.
Is this vulnerability in the current dev version of solr?


Fixed in r669766.

-Mike


[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

2008-06-16 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605403#action_12605403
 ] 

Mike Klaas commented on SOLR-14:


Note that it is very easy to use an external TokenFilter, so you could just cp 
WDF into your own class and make the changes.

(Though I'm not saying that this _shouldn't_ make it in for 1.3)

 Add the ability to preserve the original term when using WordDelimiterFilter
 

 Key: SOLR-14
 URL: https://issues.apache.org/jira/browse/SOLR-14
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Richard Trey Hyde
 Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, 
 WordDelimiterFilter.patch


 When doing prefix searching, you need to hang on to the original term 
 othewise you'll miss many matches you should be making.
 Data: ABC-12345
 WordDelimiterFitler may change this into
 ABC 12345 ABC12345
 A user may enter a search such as 
  ABC\-123*
 Which will fail to find a match given the above scenario.
 The attached patch will allow the use of the preserveOriginal option to 
 WordDelimiterFilter and will analyse as
 ABC 12345 ABC12345  ABC-12345 
 in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

2008-06-16 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605410#action_12605410
 ] 

Mike Klaas commented on SOLR-14:


Also, voting for an issue is a good way to increase its visibility

 Add the ability to preserve the original term when using WordDelimiterFilter
 

 Key: SOLR-14
 URL: https://issues.apache.org/jira/browse/SOLR-14
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Richard Trey Hyde
 Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, 
 WordDelimiterFilter.patch


 When doing prefix searching, you need to hang on to the original term 
 othewise you'll miss many matches you should be making.
 Data: ABC-12345
 WordDelimiterFitler may change this into
 ABC 12345 ABC12345
 A user may enter a search such as 
  ABC\-123*
 Which will fail to find a match given the above scenario.
 The attached patch will allow the use of the preserveOriginal option to 
 WordDelimiterFilter and will analyse as
 ABC 12345 ABC12345  ABC-12345 
 in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-06-10 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603780#action_12603780
 ] 

Mike Klaas commented on SOLR-556:
-

Thanks for the patch, Lars.  I think that the basic approach is sound, though I 
am a little nervous about the performance implications (especially in the case 
of phrase highlighting, where we spin up an entirely new spanhighlighter for 
each value in a multi-valued field).  I wonder if I am the only one who 
highlights large text fields composed of dozens of individual values?




 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-highlight-multivalued.patch, 
 solr-highlight-multivalued-example.xml


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-06-10 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603785#action_12603785
 ] 

Mike Klaas commented on SOLR-556:
-

Hey Lars,

Yeah, I'm talking about highlighting 15kB of text in 100-200 character chunks.  
Maybe I can whip up a perf test for this soon.

The reason we probably see this issue differently is that the incorrect 
behaviour is quite minor for most users (perhaps a bit of punctuation leaking 
from value to value at most).  Once way to correct what you are seeing is to 
use a tokenizer that creates tokens out of the CJK characters, or things on 
boundaries.  In your case, inserting a fake token when encountering a right 
bracket [)] would fix the problem, I think.

Nevertheless, I think I will probably end up committing your patch after 
pondering it some more.



 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-556-highlight-multivalued.patch, 
 solr-highlight-multivalued-example.xml


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Maven Artifacts

2008-06-09 Thread Mike Klaas
As someone who is completely ignorant (and admittedly, somewhat  
willfully so) of the java enterprise world, I was hoping that someone  
more savvy in the ways of maven would step in here.  It is even  
unclear to me what having the project in a Maven repository means for  
people, or why it would be convenient.


Based on the link you sent, it seems that a few things are necessary  
for this to proceed, like a maven project descriptor for Solr (or is  
that already done?).


That said, I'm +1 on steps to better propagate Solr, even if I don't  
think that I am the best person to effectuate those steps.


-Mike


On 9-Jun-08, at 12:58 AM, Andrew Savory wrote:


Hi,

Would any of the solr devs care to comment? It would be extremely  
useful to
have maven artifacts published for those building apps based on Solr  
1.2,
and it would help prepare the way for releasing Solr 1.3 maven  
artifacts.



2008/6/5 Andrew Savory [EMAIL PROTECTED]:


Hi,

2008/6/4 Andrew Savory [EMAIL PROTECTED]:


I see from http://issues.apache.org/jira/browse/SOLR-19 that some
tentative work has been done on mavenisation of solr, and from
https://issues.apache.org/jira/browse/SOLR-586 that discussion of
publishing maven artifacts ... is it possible to push solr 1.2 maven
artifacts out to the repo?



More specifically, would someone with sufficient privileges  
(Yonik?) be

willing to do the following (from [1]):

mkdir -p org.apache.solr/jars

grab the solr-1.2 release (or svn co tags/release-1.2.0, but then  
you need

to edit build.xml to update the version string that seems to have
accidentally been updated before doing release tag, to change  
property

name=version value=1.2.1-dev /)

tar xzvf apache-solr-1.2.0.tar.gz

cp apache-solr-1.2.0/dist/apache-solr-1.2.0.jar org.apache.solr/jars/

cd into org.apache.solr/jars and create md5 and sha1 checksums of
apache-solr-1.2.0.jar:
openssl md5  apache-solr-1.2.0.jar  apache-solr-1.2.0.jar.md5
openssl sha  apache-solr-1.2.0.jar  apache-solr-1.2.0.jar.sha1

sign the release:
gpg --armor --output apache-solr-1.2.0.jar.asc --detach-sig
apache-solr-1.2.0.jar

cd ../ and scp it onto people.apache.org:
scp -r org.apache.solr [EMAIL PROTECTED]:/www/
people.apache.org/repo/m1-ibiblio-rsync-repository/

check permissions:
cd /www/people.apache.org/repo/m1-ibiblio-rsync-repository/ 
org.apache.solr

chgrp -R apcvs *
chmod -R g+w *


I could do it but I suspect that would be overstepping the bounds  
of a

non-committer :-)

This will make it easier for anyone to use solr from within maven.  
I'll
file a patch to automate whatever can be automated from our ant  
build so

this is easier for the 1.3 release.

If people agree that publishing maven artifacts is a good idea, I'll
happily update http://wiki.apache.org/solr/HowToRelease to point to  
the

relevant information too.


[1] http://www.apache.org/dev/release-publishing.html#maven-repo




Andrew.
--
[EMAIL PROTECTED] / [EMAIL PROTECTED]
http://www.andrewsavory.com/




[jira] Commented: (SOLR-536) Automatic binding of results to Beans (for solrj)

2008-06-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602744#action_12602744
 ] 

Mike Klaas commented on SOLR-536:
-

 This is expensive
 private final MapClass, ListDocField infocache = 
Collections.synchronizedMap( new HashMapClass, ListDocField() );

 Let us make it
 private final MapClass, ListDocField infocache = 
new ConcurrentHashMapClass, ListDocField() ;

Expensive?  I'd expect the synchronizedMap to be faster and more memory 
compact.  The ConcurrentHashMap is definitely more concurrent, though.



 Automatic binding of results to Beans (for solrj)
 -

 Key: SOLR-536
 URL: https://issues.apache.org/jira/browse/SOLR-536
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-536.patch, SOLR-536.patch, SOLR-536.patch


 as we are using java5 .we can use annotations to bind SolrDocument to java 
 beans directly.
 This can make the usage of solrj a  bit simpler
 The QueryResponse class in solrj can have an extra method as follows
 public T ListT getResultBeans(ClassT klass)
 and the bean can have annotations as
 class MyBean{
 @Field(id) //name is optional
 String id;
 @Field(category)
 ListString categories
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602828#action_12602828
 ] 

Mike Klaas commented on SOLR-572:
-

[quote]Another use case is where Solr is used with indices that are not indices 
for a narrow domain or that don't have nice, clean, short fields that can be 
used for populating the SC index. For example, if the index consists of a pile 
of web pages, I don't think I'd want to use their data (not even their titles) 
to populate the SC index. I'd really want just a plain dictionary-powered 
SCRH.[/quote]

It works great, actually.  That was you get all the abbreviations, jargon, 
proper names, etc.   Thresholding help prevent most of the cruft from appearing 
in the index.

 Spell Checker as a Search Component
 ---

 Key: SOLR-572
 URL: https://issues.apache.org/jira/browse/SOLR-572
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
 SOLR-572.patch, SOLR-572.patch


 Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
 following features:
 * Allow creating a spell index on a given field and make it possible to have 
 multiple spell indices -- one for each field
 * Give suggestions on a per-field basis
 * Given a multi-word query, give only one consistent suggestion
 * Process the query with the same analyzer specified for the source field and 
 process each token separately
 * Allow the user to specify minimum length for a token (optional)
 Consistency criteria for a multi-word query can consist of the following:
 * Preserve the correct words in the original query as it is
 * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-284) Parsing Rich Document Types

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-284:


Affects Version/s: (was: 1.3)

Removing from 1.3.  No committer has taken ownership.

(It might make sense as a contrib, but I can see the argument for not 
duplicating tika)

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
 Fix For: 1.3

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, source.zip, test-files.zip, test-files.zip, test.zip


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-435) QParser must validate existance/absense of q parameter

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-435:


Fix Version/s: (was: 1.3)

 QParser must validate existance/absense of q parameter
 

 Key: SOLR-435
 URL: https://issues.apache.org/jira/browse/SOLR-435
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley

 Each QParser should check if q exists or not.  For some it will be required 
 others not.
 currently it throws a null pointer:
 {code}
 java.lang.NullPointerException
   at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36)
   at 
 org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)
   at org.apache.solr.search.QParser.getQuery(QParser.java:80)
   at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67)
   at 
 org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150)
 ...
 {code}
 see:
 http://www.nabble.com/query-parsing-error-to14124285.html#a14140108

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-433) MultiCore and SpellChecker replication

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-433:


Fix Version/s: (was: 1.3)

 MultiCore and SpellChecker replication
 --

 Key: SOLR-433
 URL: https://issues.apache.org/jira/browse/SOLR-433
 Project: Solr
  Issue Type: Improvement
  Components: replication, spellchecker
Affects Versions: 1.3
Reporter: Otis Gospodnetic
 Attachments: RunExecutableListener.patch, solr-433.patch, 
 spellindexfix.patch


 With MultiCore functionality coming along, it looks like we'll need to be 
 able to:
   A) snapshot each core's index directory, and
   B) replicate any and all cores' complete data directories, not just their 
 index directories.
 Pulled from the spellchecker and multi-core index replication thread - 
 http://markmail.org/message/pj2rjzegifd6zm7m
 Otis:
 I think that makes sense - distribute everything for a given core, not just 
 its index.  And the spellchecker could then also have its data dir (and only 
 index/ underneath really) and be replicated in the same fashion.
 Right?
 Ryan:
 Yes, that was my thought.  If an arbitrary directory could be distributed, 
 then you could have
   /path/to/dist/index/...
   /path/to/dist/spelling-index/...
   /path/to/dist/foo
 and that would all get put into a snapshot.  This would also let you put 
 multiple cores within a single distribution:
   /path/to/dist/core0/index/...
   /path/to/dist/core0/spelling-index/...
   /path/to/dist/core0/foo
   /path/to/dist/core1/index/...
   /path/to/dist/core1/spelling-index/...
   /path/to/dist/core1/foo

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-351) external value source

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-351:


Fix Version/s: (was: 1.3)

 external value source
 -

 Key: SOLR-351
 URL: https://issues.apache.org/jira/browse/SOLR-351
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Attachments: ExternalFileField.patch


 Need a way to rapidly do a bulk update of a single field for use as a 
 component in a function query (no need to be able to search on it).
 Idea: create an ExternalValueSource fieldType that reads it's values from a 
 file.  The file could be simple id,val records, and stored in the index 
 directory so it would get replicated.  
 Values could optionally be updated more often than the searcher 
 (hashCode/equals should take this into account to prevent caching issues).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-284) Parsing Rich Document Types

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-284:


Fix Version/s: (was: 1.3)

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, source.zip, test-files.zip, test-files.zip, test.zip


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-484) Solr Website changes

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-484:


Fix Version/s: (was: 1.3)

 Solr Website changes
 

 Key: SOLR-484
 URL: https://issues.apache.org/jira/browse/SOLR-484
 Project: Solr
  Issue Type: Bug
  Components: documentation
Reporter: Grant Ingersoll
Priority: Minor

 In looking at the Solr website it has many of the same issues that Lucene 
 Java did when it comes to ASF policies about nightly builds, etc. concerning 
 the Javadocs  
 See 
 http://lucene.markmail.org/message/a7k7kujxkhwjwfy6?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22page=1
 and 
 http://lucene.markmail.org/message/vaks6omed4l6buth?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22page=1
 This would suggest a change like Hadoop and Lucene Java did to separate out 
 the main site, release docs (javadocs, any other?) and developer resources.  
 Currently the javadocs on the main page are the nightly and should be made 
 less prominent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-84) New Solr logo?

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-84:
---

Fix Version/s: (was: 1.3)

 New Solr logo?
 --

 Key: SOLR-84
 URL: https://issues.apache.org/jira/browse/SOLR-84
 Project: Solr
  Issue Type: Improvement
Reporter: Bertrand Delacretaz
Priority: Minor
 Attachments: logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, 
 logo-solr-source-files-take2.zip, solr-84-source-files.zip, solr-f.jpg, 
 solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG, 
 solr-nick.gif, solr.jpg, sslogo-solr-flare.jpg, sslogo-solr.jpg, 
 sslogo-solr2-flare.jpg, sslogo-solr2.jpg, sslogo-solr3.jpg


 Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) 
 sarraux-dessous.ch) has reworked his logo proposal to be more solar.
 This can either be the start of a logo contest, or if people like it we could 
 adopt it. The gradients can make it a bit hard to integrate, not sure if this 
 is really a problem.
 WDYT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-410) Audit the new ResponseBuilder class

2008-06-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602834#action_12602834
 ] 

Mike Klaas commented on SOLR-410:
-

Ryan, can this be closed?

 Audit the new ResponseBuilder class
 ---

 Key: SOLR-410
 URL: https://issues.apache.org/jira/browse/SOLR-410
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
 Fix For: 1.3


 In SOLR-281, we added a ResponseBuilder class to help search components 
 communicate with one another.  Before releasing 1.3, we need to make sure 
 this is the best design and that it is an interface we can support in the 
 future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-243:



Do we still want to target 1.3 here?  (Seems like there is a lot to do before 
it is commit-worthy, based on the comments)

 Create a hook to allow custom code to create custom IndexReaders
 

 Key: SOLR-243
 URL: https://issues.apache.org/jira/browse/SOLR-243
 Project: Solr
  Issue Type: Improvement
  Components: search
 Environment: Solr core
Reporter: John Wang
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch


 I have a customized IndexReader and I want to write a Solr plugin to use my 
 derived IndexReader implementation. Currently IndexReader instantiation is 
 hard coded to be: 
 IndexReader.open(path)
 It would be really useful if this is done thru a plugable factory that can be 
 configured, e.g. IndexReaderFactory
 interface IndexReaderFactory{
  IndexReader newReader(String name,String path);
 }
 the default implementation would just return: IndexReader.open(path)
 And in the newSearcher and getSearcher methods in SolrCore class can call the 
 current factory implementation to get the IndexReader instance and then build 
 the SolrIndexSearcher by passing in the reader.
 It would be really nice to add this improvement soon (This seems to be a 
 trivial addition) as our project really depends on this.
 Thanks
 -John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-545) remove MultiCore default core / cleanup DispatchHandlera

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-545:
---

Assignee: Ryan McKinley

assigning 1.3 multicore stuff to Ryan

 remove MultiCore default core / cleanup DispatchHandlera 
 ---

 Key: SOLR-545
 URL: https://issues.apache.org/jira/browse/SOLR-545
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3


 MultiCore should require a core name in the URL.  If the core name is 
 missing, there should be a 404, not a valid core.  That is:
 http://localhost:8983/solr/select?q=*:*  should return 404.
 While we are at it, we should cleanup the DispatchHandler.  Perhaps the best 
 approach is to treat single core as multicore with only one core?  As is the 
 tangle of potential paths is ugly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-489) Added @deprecation Javadoc comments

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-489:
---

Assignee: Mike Klaas

 Added @deprecation Javadoc comments
 ---

 Key: SOLR-489
 URL: https://issues.apache.org/jira/browse/SOLR-489
 Project: Solr
  Issue Type: Bug
  Components: documentation
Reporter: Sean Timm
Assignee: Mike Klaas
Priority: Trivial
 Fix For: 1.3

 Attachments: deprecationDocumentation.patch


 In a number of files, @Deprecation annotations were added without 
 accompanying @deprecation Javadoc comments to explain what to use now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (SOLR-344) New Java API

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas closed SOLR-344.
---

Resolution: Invalid

Let's move this discussion to the wiki and mailinglist.  It isn't really an 
open issue for Solr.

 New Java API
 

 Key: SOLR-344
 URL: https://issues.apache.org/jira/browse/SOLR-344
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, search, update
Affects Versions: 1.3
Reporter: Jonathan Woods
 Attachments: New Java API for Solr.pdf


 The core Solr codebase urgently needs to expose a new Java API designed for 
 use by Java running in Solr's JVM and ultimately by core Solr code itself.  
 This API must be (i) object-oriented ('typesafe'), (ii) self-documenting, 
 (iii) at the right level of granularity, (iv) designed specifically to expose 
 the value which Solr adds over and above Lucene.
 This is an urgent issue for two reasons:
 - Java-Solr integrations represent a use-case which is nearly as important as 
 the core Solr use-case in which non-Java clients interact with Solr over HTTP
 - a significant proportion of questions on the mailing lists are clearly from 
 people who are attempting such integrations right now.
 This point in Solr development - some way out from the 1.3 release - might be 
 the right time to do the development and refactoring necessary to produce 
 this API.  We can do this without breaking any backward compatibility from 
 the point of view of XML/HTTP and JSON-like clients, and without altering the 
 core Solr algorithms which make it so efficient.  If we do this work now, we 
 can significantly speed up the spread of Solr.
 Eventually, this API should be part of core Solr code, not hived off into 
 some separate project nor in a non-first-class package space.  It should be 
 capable of forming the foundation of any new Solr development which doesn't 
 need to delve into low level constructs like DocSet and so on - and any new 
 development which does need to do just that should be a candidate for 
 incorporation into the API at the some level.  Whether or not it will ever be 
 worth re-writing existing code is a matter of opinion; but the Java API 
 should be such that if it had existed before core plug-ins were written, it 
 would have been natural to use it when writing them.
 I've attached a PDF which makes the case for this API.  Apologies for 
 delivering it as an attachment, but I wanted to embed pics and a bit of 
 formatting.
 I'll update this issue in the next few days to give a prototype of this API 
 to suggest what it might look like at present.  This will build on the work 
 already done in Solrj and SearchComponents 
 (https://issues.apache.org/jira/browse/SOLR-281), and will be a patch on an 
 up-to-date revision of Solr trunk.
 [PS:
 1.  Having written most of this, I then properly looked at 
 SearchComponents/SOLR-281 and read 
 http://www.nabble.com/forum/ViewPost.jtp?post=11050274framed=y, which says 
 much the same thing albeit more quickly!  And weeks ago, too.  But this 
 proposal is angled slightly differently:
 - it focusses on the value of creating an API not only for internal Solr 
 consumption, but for local Java clients
 - it focusses on designing a Java API without constantly being hobbled by 
 HTTP-Java
 - it's suggesting that the SearchComponents work should result in a Java API 
 which can be used as much by third party Java as by ResponseBuilder.
 2.  I've made some attempt to address Hoss's point 
 (http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#6551097579454875774)
  - that an API like this would need to maintain enough state e.g. to allow an 
 initial search to later be faceted, highlighted etc without going back to the 
 start each time - but clearly the proof of the pudding will be in the 
 prototype.
 3.  Again, I've just discovered SOLR-212 (DirectSolrConnection).  I think all 
 my comments about Solrj apply to this, useful though it clearly is.]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-200) Scripts don't work when run as root in ~root and su'ing to a user

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-200.
-

Resolution: Won't Fix

It doesn't surprise me that /root as the indexdir and / as solr_home 
doesn't work, being root or not.  I don't think that this is an important case.

 Scripts don't work when run as root in ~root and su'ing to a user
 -

 Key: SOLR-200
 URL: https://issues.apache.org/jira/browse/SOLR-200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Jürgen Hermann
Priority: Minor

 This patch avoids an error due to permission problems when orig_dir is /root
 -orig_dir=$(pwd)
 -cd ${0%/*}/..
 -solr_root=$(pwd)
 -cd ${orig_dir}
 +solr_root=$(cd ${0%/*}/..  pwd)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-517) highlighter doesn't work with hl.requireFieldMatch=true on un-optimized index

2008-06-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602857#action_12602857
 ] 

Mike Klaas commented on SOLR-517:
-

Koji:  Is this resolved?  I seemed to recall that we brought this up on 
java-dev, but I can't find the thread at the moment.

(I don't think that the right thing to do is remove idf fetching of the terms 
as your patch proposes)

 highlighter doesn't work with hl.requireFieldMatch=true on un-optimized index
 -

 Key: SOLR-517
 URL: https://issues.apache.org/jira/browse/SOLR-517
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.2, 1.3
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-517.patch, SOLR-517.patch


 On un-optimized index, highlighter doesn't work with 
 hl.requireFieldMatch=true.
 see:
 http://www.nabble.com/hl.requireFieldMatch-and-idf-td16324482.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-522) analysis.jsp doesn't show payloads created/modified by tokenizers and tokenfilters

2008-06-05 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-522:


Fix Version/s: 1.3

 analysis.jsp doesn't show payloads created/modified by tokenizers and 
 tokenfilters
 --

 Key: SOLR-522
 URL: https://issues.apache.org/jira/browse/SOLR-522
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Tricia Williams
Assignee: Mike Klaas
Priority: Trivial
 Fix For: 1.3

 Attachments: SOLR-522-analysis.jsp.patch, SOLR-522-analysis.jsp.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 Add payload content to the vebose output of the analysis.jsp page for 
 debugging purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders

2008-06-05 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602860#action_12602860
 ] 

Mike Klaas commented on SOLR-243:
-

Hi John,

Hoss has marked the issue for 1.3, so it will be in the release.

-Mike

 Create a hook to allow custom code to create custom IndexReaders
 

 Key: SOLR-243
 URL: https://issues.apache.org/jira/browse/SOLR-243
 Project: Solr
  Issue Type: Improvement
  Components: search
 Environment: Solr core
Reporter: John Wang
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, 
 indexReaderFactory.patch, indexReaderFactory.patch


 I have a customized IndexReader and I want to write a Solr plugin to use my 
 derived IndexReader implementation. Currently IndexReader instantiation is 
 hard coded to be: 
 IndexReader.open(path)
 It would be really useful if this is done thru a plugable factory that can be 
 configured, e.g. IndexReaderFactory
 interface IndexReaderFactory{
  IndexReader newReader(String name,String path);
 }
 the default implementation would just return: IndexReader.open(path)
 And in the newSearcher and getSearcher methods in SolrCore class can call the 
 current factory implementation to get the IndexReader instance and then build 
 the SolrIndexSearcher by passing in the reader.
 It would be really nice to add this improvement soon (This seems to be a 
 trivial addition) as our project really depends on this.
 Thanks
 -John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [important] call for 1.3 planning

2008-06-05 Thread Mike Klaas


On 21-May-08, at 4:45 PM, Mike Klaas wrote:

There seems to be some sort of consensus building that there should  
be a 1.3 release in the near future.  The first step is to figure  
out what we want to finish before it gets released.


The list of JIRA issues currently labeled 1.3 can be found here:

http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12312486 



Let's try to get an assignee for every issue in that list by a week  
from now.  If nobody steps up for an issue in that time, I'll assume  
it is low enough priority to move post-1.3.  This would also be a  
good time to add any issues that you want to champion for 1.3.


That brings us down to 20 issues, with only 2 unassigned: SOLR-424 and  
SOLR-410.  I removed a few of the feature issues with no assignee.


Seems like the big things that need to get done are:
  - componented spellchecking
  - contrib area + data import handler
  - distributed search

-Mike


Re: 3 TokenFilter factories not compatible with 1.2

2008-06-04 Thread Mike Klaas

On 4-Jun-08, at 5:24 PM, Yonik Seeley wrote:


On Wed, Jun 4, 2008 at 7:03 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:

3) Documentation and Education
Since this wasn't exactly a use case we ever advertised, we could  
punt on
the problem by putting a disclaimer in the CAHNGES.txt that ayone  
directly

constructing those 3 classes should explicitly call inform() on the
instances after calling init.


#3 is obviously the simplest approach as developers, and to be  
quite honest:
probably impacts the fewest total number of people (since there are  
probably

very few people constructing Factory instances themselves)


+1


+1, perhaps also pinging -user to see if there is a sizable group of  
people doing this.


-Mike


[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-06-04 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602541#action_12602541
 ] 

Mike Klaas commented on SOLR-556:
-

Ah, I see what the problem is:  Although it is impossible for tokens from 
different values to appear in the same fragment (due to the semantics of 
MultiValuedTokenFilter), the non-token text (typically, punctuation) from 
different values can bleed into the same fragment, since lucene's highlighter 
can only create a new fragment on token boundaries.

Unfortunately SOLR-553 was committed a day after you submitted your patch, and 
rearranges the code slightly so that it no longer applies.  Could you sync the 
patch with trunk?  I think the basic approach is sound.

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: solr-highlight-multivalued-example.xml, 
 solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-161) Dangling dash causes stack trace

2008-06-03 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602038#action_12602038
 ] 

Mike Klaas commented on SOLR-161:
-

 It is really a Lucene query parser bug, but it wouldn't hurt to do s/(.*)-// 
 as a workaround. Assuming my ed(1) syntax is still  fresh. Regardless, no 
 query string should ever give a stack trace

This might be hard to guarantee.  Already there are four issues details 
specific ways that dismax that barf on input.  A lot of the suggestions above 
are of the form of detecting a specific failure mode and correcting it, which 
does not guarantee that you will catch them all.

A robust way to do it is parse the query into an AST using a grammar in a way 
that matches the query as well as possible (dropping the stuff that doesn't 
fit).  Unfortunately, this is duplicative of the lucene parsing logic, and it 
would be nicer add a relaxed mode to lucene rather than pre-parsing the query.

(The reparse+reassemble method is what we use, btw.  It is written in python 
but it might be possible to translate to java.)

 Dangling dash causes stack trace
 

 Key: SOLR-161
 URL: https://issues.apache.org/jira/browse/SOLR-161
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.1.0
 Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
Reporter: Walter Underwood

 I'm running tests from our search logs, and we have a query that ends in a 
 dash. That caused a stack trace.
 org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the 
 truth -': Encountered EOF at line 1, column 23.
 Was expecting one of:
 ( ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
   at 
 org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
   at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-1293) Tweaks to PhraseQuery.explain()

2008-05-29 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600973#action_12600973
 ] 

Mike Klaas commented on LUCENE-1293:


It is meant for debugging, though I have found it so painfully slow in the past 
that I have avoided it on occasion.

The main culprit is the looped next() call in PhraseScorer.explain().  Using 
skipTo() would be faster.

 Tweaks to PhraseQuery.explain()
 ---

 Key: LUCENE-1293
 URL: https://issues.apache.org/jira/browse/LUCENE-1293
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
Reporter: Itamar Syn-Hershko
Priority: Minor
 Fix For: 2.4


 The explain() function in PhraseQuery.java is very clumzy and could use many 
 optimizations. Perhaps it is only because it is intended to use while 
 debugging?
 Here's an example:
 {noformat}
   result.addDetail(fieldExpl);
   // combine them
   result.setValue(queryExpl.getValue() * fieldExpl.getValue());
   if (queryExpl.getValue() == 1.0f)
 return fieldExpl;
   return result;
}
 {noformat}
 Can easily be tweaked and become:
 {noformat}
   if (queryExpl.getValue() == 1.0f) {
 return fieldExpl;
   }
   result.addDetail(fieldExpl);
   // combine them
   result.setValue(queryExpl.getValue() * fieldExpl.getValue());
   return result;
   }
 {noformat}
 And thats really just for a start...
 Itamar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Release of SOLR 1.3

2008-05-23 Thread Mike Klaas


On 20-May-08, at 12:32 PM, Shalin Shekhar Mangar wrote:


+1 for your suggestions Mike.

I'd like to see a few of the smaller issues get committed in 1.3  
such as
SOLR-256 (JMX), SOLR-536 (binding for SolrJ), SOLR-430 (SpellChecker  
support
in SolrJ) etc. Also, SOLR-561 (replication by Solr) would be really  
cool to
have in the next release. Noble and I are working on it and plan to  
give a

patch soon.


Whether something makes it in to this release will depend mostly on  
getting the buy-in and time commitment from one of the committers  
familiar with that aspect of the project.  There is so much in 1.3 as  
it is that I think our focus should be on getting it out sooner rather  
than adding things.  But small things that significantly improve the  
release are good too.


SOLR-561 seems like a rather large project to me (although I have  
never even used the existing collection distribution method).


Mike -- you removed SOLR-563 (Contrib area for Solr) from 1.3 but it  
is a

dependency for SOLR-469 (DataImportHandler) as it was decided to have
DataImportHandler as a contrib project. It would also be good to  
have a
rough release roadmaps to work against. Can fixed release cycle (say  
every 6

months) work for Solr?


Twice-yearly releases would be nice to aim for, but I think we're too  
small a project to fix release dates in advance.


-Mike


Re: Release of SOLR 1.3

2008-05-22 Thread Mike Klaas


On 22-May-08, at 12:13 AM, Andrew Savory wrote:


Sure, Commit-Then-Review vs. Review-Then-Commit ... but I don't
actually think RTC is going to ensure significantly more widespread
review since the time burden on other developers to find the issue in
JIRA, download the patch, apply the patch, test, respond, then revert
the change. Do people really have the time to do that?  It's
significantly more effort than that to svn update, look at code, and
feed back. I prefer detailed discussion on the mailing list (which
supports decent threading, quoting etc, unlike JIRA) followed by
commit of a trial implementation which can then be refactored.
Otherwise there might be a tendency to analysis paralysis. But I'm the
new boy here, so I'll STFU and try to help out on the release instead
of forcing y'all to rehash old discussions on how to run an open
source project ;-) Maybe by the time 1.3 is out the door we'll all be
using distributed SCM systems and the discussion will be moot anyway!


I think we agree in principle--a patch does not have to be spotless to  
be committed.  I also agree that that mailinglist is a preferable  
place to hash out design details.  But it is necessary that the basic  
approach is one we feel will stick with before getting committed.  I  
don't think this imposes much of a burden on people aiming to review a  
patch.


It is true that using patches takes an extra minute or two to set up,  
but the time to evaluate a contribution is _by far_ mostly contained  
in understanding the contribution, its implications, and examining the  
code.  Plus, the patch is much easier to back out of a given  
repository and makes it easier to see exactly what changes were made.   
Since contributors can't commit to the repository anyway, I don't see  
much disadvantage in working with patches.


(btw, if you want a one-line equivalent to svn up, try something like:

$ wget http://issues.apache.org/jira/secure/attachment/12381498/SOLR-563.patch 
 -O - | patch -p0


Reverting is also one line:
$ svn revert -R .

Although this leaves added files, which can be removed with
$ svn st | grep '?' | awk '{print $2}' | xargs rm

Another useful trick is to have multiple checkouts of trunk and  
bounce an active changeset from one to another with

$ svn diff | (cd ../otherbranch; patch -p0)
)

-Mike

-Mike


[important] call for 1.3 planning

2008-05-21 Thread Mike Klaas
There seems to be some sort of consensus building that there should be  
a 1.3 release in the near future.  The first step is to figure out  
what we want to finish before it gets released.


The list of JIRA issues currently labeled 1.3 can be found here:

http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12312486 



Let's try to get an assignee for every issue in that list by a week  
from now.  If nobody steps up for an issue in that time, I'll assume  
it is low enough priority to move post-1.3.  This would also be a good  
time to add any issues that you want to champion for 1.3.


(This isn't meant to be a final list, just something to help get us  
started.  Most of the unassigned issues were reported by committers,  
so that should hopefully make it easy to figure out the assignee.)


-Mike




[jira] Updated: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-556:


Fix Version/s: 1.3

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Assignee: Mike Klaas
Priority: Minor
 Fix For: 1.3

 Attachments: solr-highlight-multivalued-example.xml, 
 solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-536) Automatic binding of results to Beans (for solrj)

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-536:


Fix Version/s: (was: 1.3)

 Automatic binding of results to Beans (for solrj)
 -

 Key: SOLR-536
 URL: https://issues.apache.org/jira/browse/SOLR-536
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
Reporter: Noble Paul
Priority: Minor
 Attachments: SOLR-536.patch


 as we are using java5 .we can use annotations to bind SolrDocument to java 
 beans directly.
 This can make the usage of solrj a  bit simpler
 The QueryResponse class in solrj can have an extra method as follows
 public T ListT getResultBeans(ClassT klass)
 and the bean can have annotations as
 class MyBean{
 @Field(id) //name is optional
 String id;
 @Field(category)
 ListString categories
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-579) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-579:


Fix Version/s: (was: 1.3)

 Extend SimplePost with RecurseDirectories, threads, document encoding , 
 number of docs per commit
 -

 Key: SOLR-579
 URL: https://issues.apache.org/jira/browse/SOLR-579
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.3
 Environment: Applies to all platforms
Reporter: Patrick Debois
Priority: Minor
   Original Estimate: 72h
  Remaining Estimate: 72h

 -When specifying a directory, simplepost should read also the contents of a  
 directory
 New options for the commandline (some only usefull in DATAMODE= files)
 -RECURSEDIRS
 Recursive read of directories as an option, this is usefull for 
 directories with a lot of files where the commandline expansion fails and 
 xargs is too slow
 -DOCENCODING (default = system encoding or UTF-8) 
 For non utf-8 clients , simplepost should include a way to set the 
 encoding of the documents posted
 -THREADSIZE (default =1 ) 
 For large volume posts, a threading pool makes sense , using JDK 1.5 
 Threadpool model
 -DOCSPERCOMMIT (default = 1)
 Number of documents after which a commit is done, instead of only at 
 the end
 Note: not to break the existing behaviour of the existing SimplePost tool 
 (post.sh) might be used in scripts 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-383) Add support for globalization/culture management

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-383.
-

   Resolution: Fixed
Fix Version/s: (was: 1.3)

 Add support for globalization/culture management
 

 Key: SOLR-383
 URL: https://issues.apache.org/jira/browse/SOLR-383
 Project: Solr
  Issue Type: Improvement
  Components: clients - C#
Affects Versions: 1.3
Reporter: Jeff Rodenburg
Assignee: Jeff Rodenburg
Priority: Minor

 SolrSharp should supply configuration and/or programmatic control over 
 windows culture settings.  This is important for working with data being 
 saved to indexes that carry certain formatting expectations for various types 
 of fields, both in SolrSharp as well as the solr field counterparts on the 
 server side.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-563) Contrib area for Solr

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-563:


Fix Version/s: (was: 1.3)

 Contrib area for Solr
 -

 Key: SOLR-563
 URL: https://issues.apache.org/jira/browse/SOLR-563
 Project: Solr
  Issue Type: Task
Affects Versions: 1.3
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-563.patch


 Add a contrib area for Solr and modify existing build.xml to build, package 
 and distribute contrib projects also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-565) Component to abstract shards from clients

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-565:


Fix Version/s: (was: 1.3)

 Component to abstract shards from clients
 -

 Key: SOLR-565
 URL: https://issues.apache.org/jira/browse/SOLR-565
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Priority: Minor
 Attachments: distributor_component.patch


 A component that will remove the need for calling clients to provide the 
 shards parameter for
 a distributed search. 
 As systems grow, it's better to manage shards with in solr, rather than 
 managing each client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-551) SOlr replication should include the schema also

2008-05-20 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-551:


Fix Version/s: (was: 1.3)

 SOlr replication should include the schema also
 ---

 Key: SOLR-551
 URL: https://issues.apache.org/jira/browse/SOLR-551
 Project: Solr
  Issue Type: Improvement
  Components: replication
Affects Versions: 1.3
Reporter: Noble Paul

 The current Solr replication just copy the data directory . So if the
 schema changes and I do a re-index it will blissfully copy the index
 and the slaves will fail because of incompatible schema.
 So the steps we follow are
  * Stop rsync on slaves
  * Update the master with new schema
  * re-index data
  * forEach slave
  ** Kill the slave
  ** clean the data directory
  ** install the new schema
  ** restart
  ** do a manual snappull
 The amount of work the admin needs to do is quite significant
 (depending on the no:of slaves). These are manual steps and very error
 prone
 The solution :
 Make the replication mechanism handle the schema replication also. So
 all I need to do is to just change the master and the slaves synch
 automatically
 What is a good way to implement this?
 We have an idea along the following lines
 This should involve changes to the snapshooter and snappuller scripts
 and the snapinstaller components
 Everytime the snapshooter takes a snapshot it must keep the timestamps
 of schema.xml and elevate.xml (all the files which might affect the
 runtime behavior in slaves)
 For subsequent snapshots if the timestamps of any of them is changed
 it must copy the all of them also for replication.
 The snappuller copies the new directory as usual
 The snapinstaller checks if these config files are present ,
 if yes,
  * It can create a temporary core
  * install the changed index and configuration
  * load it completely and swap it out with the original core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   >