Now, a lost data problem with trunk too

2010-09-14 Thread karl.wright
Hi folks,

It looks like the handle leak may be real - Simon Willnauer has been looking at 
it and could not find an explanation for the behavior I have been seeing.  But 
before we got too far on that problem, I encountered what appears to be an even 
more serious problem.  Specifically, I'm losing field data out of some records.

The index I'm building is fairly large - some 25M records when complete.  What 
I'm seeing is that the main searchable field (value) is not finding all the 
records it should.  I was able to locate one such record just now:

curl 
http://localhost:8983/solr/nose/standard?fl=*,scoreq=id:\POI|DEU:205:20187477:1014564|brandenburger+tor\
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime95/intlst name=paramsstr 
name=qid:POI|DEU:205:20187477:1014564|brandenburger tor/strstr 
name=fl*,score/str/lst/lstresult name=response numFound=1 
start=0 maxScore=17.335964docfloat name=score17.335964/floatstr 
name=entityidPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
name=referencebrandenburger tor, potsdam, deutschland/strstr 
name=typepoi/str ... /doc/result
/response

.. but it is completely missing the supposedly required value field:

   !-- The value field.  This contains the actual string that will be 
matched.--
   field name=value type=string_idx  required=true stored=false/

The code that does the indexing is straightforward, and *some* of the records 
of this class are indeed searchable via the value field, but others aren't.  
I know the value field is non-empty, because it is used to construct the id 
field, which is correct above.

Simon is also looking into this one, but if anyone else has advice for figuring 
out what's going wrong, please let me know.  FWIW, this is a trunk build from 
Monday morning.

Karl

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Now, a lost data problem with trunk too

2010-09-14 Thread Simon Willnauer
On Tue, Sep 14, 2010 at 10:37 AM,  karl.wri...@nokia.com wrote:
 Hi folks,

 It looks like the handle leak may be real - Simon Willnauer has been looking 
 at it and could not find an explanation for the behavior I have been seeing.  
 But before we got too far on that problem, I encountered what appears to be 
 an even more serious problem.  Specifically, I'm losing field data out of 
 some records.

 The index I'm building is fairly large - some 25M records when complete.  
 What I'm seeing is that the main searchable field (value) is not finding 
 all the records it should.  I was able to locate one such record just now:

 curl 
 http://localhost:8983/solr/nose/standard?fl=*,scoreq=id:\POI|DEU:205:20187477:1014564|brandenburger+tor\
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime95/intlst name=paramsstr 
 name=qid:POI|DEU:205:20187477:1014564|brandenburger tor/strstr 
 name=fl*,score/str/lst/lstresult name=response numFound=1 
 start=0 maxScore=17.335964docfloat name=score17.335964/floatstr 
 name=entityidPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
 name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
 name=referencebrandenburger tor, potsdam, deutschland/strstr 
 name=typepoi/str ... /doc/result
 /response

 .. but it is completely missing the supposedly required value field:

   !-- The value field.  This contains the actual string that will be 
 matched.--
   field name=value type=string_idx  required=true stored=false/
that does not show up since it is not stored - maybe thats the reason :)

simon

 The code that does the indexing is straightforward, and *some* of the records 
 of this class are indeed searchable via the value field, but others aren't. 
  I know the value field is non-empty, because it is used to construct the 
 id field, which is correct above.

 Simon is also looking into this one, but if anyone else has advice for 
 figuring out what's going wrong, please let me know.  FWIW, this is a trunk 
 build from Monday morning.

 Karl

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Fwd: Trunk file handle leak?

2010-09-14 Thread Simon Willnauer
An update on this, the error was on my side doubly incrementing the
searcher reference. no problem on trunk!

simon

On Fri, Sep 10, 2010 at 10:04 PM, Simon Rosenthal
simon.rosent...@yahoo.com wrote:
 Karl:

 I reported something very similar a few months back and opened a Jira issue
 - see https://issues.apache.org/jira/browse/SOLR-1911. After I changed to a
 newer nightly build the leak went away. The issue is still open, so you mayw
 ant to update it.

 -Simon

 

 -- Forwarded message --
 From: karl.wri...@nokia.com
 Date: Fri, Sep 10, 2010 at 3:24 PM
 Subject: RE: Trunk file handle leak?
 To: dev@lucene.apache.org, yo...@lucidimagination.com


 Hi Yonik,

 Be that as it may, I'm seeing a steady increase in file handles used by that
 process over an extended period of time (now 20+ minutes):

 r...@duck6:~# lsof -p 22379 | wc
    786    7714  108339
 r...@duck6:~# lsof -p 22379 | wc
    787    7723  108469
 r...@duck6:~# lsof -p 22379 | wc
    787    7723  108469
 r...@duck6:~# lsof -p 22379 | wc
    812    7948  111719
 r...@duck6:~# lsof -p 22379 | wc
    816    7984  112239
 r...@duck6:~# lsof -p 22379 | wc
    817    7993  112369
 r...@duck6:~# lsof -p 22379 | wc
    822    8038  113019
 r...@duck6:~# lsof -p 22379 | wc
    847    8308  116719
 r...@duck6:~# lsof -p 22379 | wc
    852    8353  117369
 r...@duck6:~# lsof -p 22379 | wc
    897    8803  123669
 r...@duck6:~# lsof -p 22379 | wc
   1022   10018  140819
 r...@duck6:~#

 This doesn't smell like spiky resource usage to me.  It smells like a leak.
 ;-)

 Karl

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik
 Seeley
 Sent: Friday, September 10, 2010 2:08 PM
 To: dev@lucene.apache.org
 Subject: Re: Trunk file handle leak?

 On Fri, Sep 10, 2010 at 1:51 PM,  karl.wri...@nokia.com wrote:
 (1) There are periodic commits, every 10,000 records.
 (2) I have no searcher/reader open at the same time, that I am aware of.
  This is a straight indexing task.  (You ought to know, you wrote some of
 the code!)

 A commit currently opens a new searcher in Solr.
 It's not too hard to go past 1024 descriptors - either raise the limit
 to 10240 or something, use the compound file format, or lower the
 merge factor.


 (3) I *do* see auto warming being called, but it seems not to happen at
 the same time as a commit, but rather afterwards.

 Once it starts happening, this happens repeatedly on every commit.

 This would also be expected - it's at a point where there are too many
 files for your descriptors.

 -Yonik
 http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2106) Spelling Checking for Multiple Fields

2010-09-14 Thread JAYABAALAN V (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909157#action_12909157
 ] 

JAYABAALAN V commented on SOLR-2106:


what is procedure to download the SOLR-2010.patch files into the exisiting 
Apache Solr v1.4

 Spelling Checking for Multiple Fields
 -

 Key: SOLR-2106
 URL: https://issues.apache.org/jira/browse/SOLR-2106
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.4
 Environment: Linux Environment
Reporter: JAYABAALAN V
 Fix For: 1.4

   Original Estimate: 0.02h
  Remaining Estimate: 0.02h

 Need to enable spellchecking for five different field and it's 
 configuration.I am using dismax query parser for searching the different 
 fields in the simple.If user has entered a wrong spelling in the front end.It 
 should check in the five different fields and give collate spelling 
 suggestion in the front end and should get a result based on the spelling 
 suggestion.Do provide your configuration details for the same...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1900) move Solr to flex APIs

2010-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909160#action_12909160
 ] 

Michael McCandless commented on SOLR-1900:
--

I think it makes sense to move append to BytesRef, though I wonder if it should 
it over-allocate (ArrayUtil.oversize) when it grows?  I realize for the current 
calls to append we don't need that (you just append bigTerm, once), but if 
someone uses this like a StringBuffer... though, this isn't really the 
intention of BytesRef, so maybe it's OK to not oversize.

 move Solr to flex APIs
 --

 Key: SOLR-1900
 URL: https://issues.apache.org/jira/browse/SOLR-1900
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, 
 SOLR-1900_bigTerm.txt, SOLR-1900_FileFloatSource.patch, 
 SOLR-1900_termsComponent.txt


 Solr should use flex APIs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Now, a lost data problem with trunk too

2010-09-14 Thread karl.wright
Yes. Of course.  My oversight.

So I did the obvious thing and searched for the value field directly, and it is 
there:

str name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
name=languageger/strstr name=latitude52.39935/strstr 
name=longitude13.04793/strstr name=referencebrandenburger tor, 
potsdam, deutschland/str


So, something about the way I am searching for it is not right.  Looking 
elsewhere.

Karl



From: ext Simon Willnauer [simon.willna...@googlemail.com]
Sent: Tuesday, September 14, 2010 4:52 AM
To: dev@lucene.apache.org
Subject: Re: Now, a lost data problem with trunk too

On Tue, Sep 14, 2010 at 10:37 AM,  karl.wri...@nokia.com wrote:
 Hi folks,

 It looks like the handle leak may be real - Simon Willnauer has been looking 
 at it and could not find an explanation for the behavior I have been seeing.  
 But before we got too far on that problem, I encountered what appears to be 
 an even more serious problem.  Specifically, I'm losing field data out of 
 some records.

 The index I'm building is fairly large - some 25M records when complete.  
 What I'm seeing is that the main searchable field (value) is not finding 
 all the records it should.  I was able to locate one such record just now:

 curl 
 http://localhost:8983/solr/nose/standard?fl=*,scoreq=id:\POI|DEU:205:20187477:1014564|brandenburger+tor\
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime95/intlst name=paramsstr 
 name=qid:POI|DEU:205:20187477:1014564|brandenburger tor/strstr 
 name=fl*,score/str/lst/lstresult name=response numFound=1 
 start=0 maxScore=17.335964docfloat name=score17.335964/floatstr 
 name=entityidPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
 name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr 
 name=referencebrandenburger tor, potsdam, deutschland/strstr 
 name=typepoi/str ... /doc/result
 /response

 .. but it is completely missing the supposedly required value field:

   !-- The value field.  This contains the actual string that will be 
 matched.--
   field name=value type=string_idx  required=true stored=false/
that does not show up since it is not stored - maybe thats the reason :)

simon

 The code that does the indexing is straightforward, and *some* of the records 
 of this class are indeed searchable via the value field, but others aren't. 
  I know the value field is non-empty, because it is used to construct the 
 id field, which is correct above.

 Simon is also looking into this one, but if anyone else has advice for 
 figuring out what's going wrong, please let me know.  FWIW, this is a trunk 
 build from Monday morning.

 Karl

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Commented: (SOLR-2106) Spelling Checking for Multiple Fields

2010-09-14 Thread Erick Erickson
See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

http://wiki.apache.org/solr/HowToContribute#Working_With_PatchesErick

On Tue, Sep 14, 2010 at 5:11 AM, JAYABAALAN V (JIRA) j...@apache.orgwrote:


[
 https://issues.apache.org/jira/browse/SOLR-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909157#action_12909157]

 JAYABAALAN V commented on SOLR-2106:
 

 what is procedure to download the SOLR-2010.patch files into the exisiting
 Apache Solr v1.4

  Spelling Checking for Multiple Fields
  -
 
  Key: SOLR-2106
  URL: https://issues.apache.org/jira/browse/SOLR-2106
  Project: Solr
   Issue Type: Bug
   Components: spellchecker
 Affects Versions: 1.4
  Environment: Linux Environment
 Reporter: JAYABAALAN V
  Fix For: 1.4
 
Original Estimate: 0.02h
   Remaining Estimate: 0.02h
 
  Need to enable spellchecking for five different field and it's
 configuration.I am using dismax query parser for searching the different
 fields in the simple.If user has entered a wrong spelling in the front
 end.It should check in the five different fields and give collate spelling
 suggestion in the front end and should get a result based on the spelling
 suggestion.Do provide your configuration details for the same...

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909214#action_12909214
 ] 

Robert Muir commented on LUCENE-2504:
-

{quote}
Java (Oracle) really needs to do something to address this.
{quote}

I think we all owe it to ourselves to stop equating java with Sun/Oracle, if 
Java 
stays with Oracle its pretty obvious the language (is) will die anyway.

{quote}
I think this is a severe and growing problem for Lucene going forward
- our search performance is crucial and we can't risk hotspot
randomly, substantially slowing things down by alot.
{quote}

While I agree at the moment we should make efforts to work around issues like 
this,
I don't think we should jump the gun and make real design/architectural
choices based on Oracle bugs.

Especially for trunk, by the time we release Lucene 4.0 some other company
will probably own Java anyway.

{quote}
Not that we have a choice here... but I've often wondered whether .NET
has this same hotspot fickleness problem
{quote}

.NET is not a choice but generating C/C++ code is?
 

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1682) Implement CollapseComponent

2010-09-14 Thread Varun Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909223#action_12909223
 ] 

Varun Gupta commented on SOLR-1682:
---

Is there any workaround to use Highlight and Facet components along with 
grouping?

 Implement CollapseComponent
 ---

 Key: SOLR-1682
 URL: https://issues.apache.org/jira/browse/SOLR-1682
 Project: Solr
  Issue Type: Sub-task
  Components: search
Reporter: Martijn van Groningen
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: field-collapsing.patch, SOLR-1682.patch, 
 SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, 
 SOLR-1682.patch, SOLR-1682.patch, SOLR-1682_prototype.patch, 
 SOLR-1682_prototype.patch, SOLR-1682_prototype.patch, SOLR-236.patch


 Child issue of SOLR-236. This issue is dedicated to field collapsing in 
 general and all its code (CollapseComponent, DocumentCollapsers and 
 CollapseCollectors). The main goal is the finalize the request parameters and 
 response format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909230#action_12909230
 ] 

Simon Willnauer commented on LUCENE-2504:
-

bq. I think we all owe it to ourselves to stop equating java with Sun/Oracle, 
if Java stays with Oracle its pretty obvious the language (is) will die anyway.
I agree with robert that we should stop comparing against sun jvms all the time 
and turn everything upside-down specializing code here and there or go one step 
further and generate C++ code. Dude who is gonna maintain the compatibility to 
Java-Only environments? I could imagine that we have something which is super 
special purpose like mike did with DirectNIOFSDirectory  to work around 
unexposed methods like fadvice. 

I think that code specializations of very hot part of lucene are ok and we 
should follow that way like we did at some places but it already make things 
very complicated to follow. Without the knowledge of a committer or a person 
actively following that development it is extremely difficult to comprehend 
design decisions.

I would rather like the idea to put effort in stuff like harmony and make code 
we can control perform better that introducing a preprocessor which generates 
code for a JVM owned by a company. Would it make way more sense to push OSS 
JVMs than spending lots of time on investigating on .NET as an alternative or 
C/C++ code generator? Before I would go the C++ path I'd rather use Java to 
host a C core like lucy which brings you as close as it gets to the machine. 

bq. EG, see my post here:

interesting papers - seems we are touching the limits of Java though.



 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Whither ORP?

2010-09-14 Thread Grant Ingersoll

On Sep 13, 2010, at 12:33 PM, Itamar Syn-Hershko wrote:

 With the proper two-way open-source development process (taking and then 
 giving) I think it can become an important part of open-IR technologies, just 
 like what Lucene did to the search engines world. What ORP has to offer is of 
 great interest to HebMorph, an open-source project of mine trying to decide 
 on what is the best way to index and search Hebrew texts.
 
 To this end I decided to put some of the development efforts of the HebMorph 
 project into making tools for the ORP. I have announced this before, but 
 unfortunately I had to attend to more pressing tasks before I could complete 
 this (and there was no response from the community anyway...). Just in case 
 you're interested in seeing what I came up with so far: 
 http://github.com/synhershko/Orev.

If you can, putting them up as a patch would be useful.  That way, we can show 
some progress.

 
 IMHO, the ORP should stand by itself, and relate to Lucene/Solr only as its 
 basis framework for these initial stages. Perhaps also try to attract more 
 people who could find an interest in what it has to offer, so it can really 
 start growing.
 
 Itamar.
 
 On 12/9/2010 1:29 PM, Grant Ingersoll wrote:
 On Sep 11, 2010, at 8:51 PM, Robert Muir wrote:
 
   
 i propose we take what we have and import into lucene-java's benchmark
 contrib.  it already has integration with wikipedia and reuters for perf
 purposes, and the quality package is actually there anyways.  later, maybe
 more people have time and contrib/benchmark evolves naturally... e.g. to
 modules/benchmark with solr support as a first big step.
 
 Yeah, that seems reasonable.  I have been thinking lately that it might be 
 useful to pull our DocMaker stuff out separately from benchmark so that 
 people have easy ways of generating content from things like Wikipedia, etc.
 
 Still, at the end of the day, I like what ORP _could_ bring to the table and 
 to some extent I think that is lost by folding it into Lucene benchmark.
 
   
 On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org  wrote:
 
 Seems ORP isn't really catching on with people. I know personally I don't
   
 have the time I had hoped to have to get it going. At the same time, I
 really think it could be a good project. We've got some tools put together,
 but we still haven't done much about the bigger goal of a self contained
 evaluation.
 
 Any thoughts on how we should proceed with ORP?
 
 -Grant
   
 
 
   

--
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8



Re: Whither ORP?

2010-09-14 Thread Grant Ingersoll
I think the biggest hurdle we have in front of us is curating a data set that 
we can redistribute.  I'm in the process of uploading all the ASF public mail 
archives as of Sept. 13 to Amazon S3.  I also have some tools (thanks to Chris 
Rhodes) for processing this into Solr XML.  I think this would give us a 
standard corpus to start with and would fairly well mimic some enterprise 
search/eDiscovery tasks pretty well.

At any rate, as with any community, the proof is in people stepping up to help 
out.  I like that so many people suggested we keep going.  As for what to do, I 
think the options are pretty wide open and there is opportunity for people to 
define the project w/o any previous encumbrances. 

Some ideas that have been kicked around in the past:
1. Creative-commons data set, judgments, queries
2. Open Street Map (spatial search)
3. Mail archives
4. A crowd sourcing application.  Given a set of documents and queries, have 
people provide judgments.  Ideally, this runs in a web container and we could 
probably even find resources to host it here.  Combining that with one of the 
items above, we would be on our way.  App could also solicit queries by 
providing users open search box and opportunities to browse the data.

I know much of this is simplistic, but it is a start.

-Grant


On Sep 13, 2010, at 9:04 PM, Dan Cardin wrote:

 Hello,
 
 I am new to ORP. I would like to contribute to the project. I do not have a
 lot of experience in this field of IR, crowd sourcing or AI. If someone
 could take the lead and set forward path I would be willing to contribute my
 skill set to ORP.
 
 How can I help? I have a lot of experience doing software development and
 system administration.
 
 Cheers,
 --Dan
 
 On Mon, Sep 13, 2010 at 1:36 PM, Omar Alonso oralo...@yahoo.com wrote:
 
 I think ORP is a great candidate for crowdsourcing/human computation. In
 the last year or so there's been quite a bit of research and applications on
 this. See the page for the SIGIR workshop on using crowdsourcing for IR
 evaluation: 
 http://www.ischool.utexas.edu/~cse2010/http://www.ischool.utexas.edu/%7Ecse2010/
 
 Omar
 
 --- On Mon, 9/13/10, Itamar Syn-Hershko ita...@code972.com wrote:
 
 From: Itamar Syn-Hershko ita...@code972.com
 Subject: Re: Whither ORP?
 To: openrelevance-...@lucene.apache.org
 Date: Monday, September 13, 2010, 9:33 AM
 With the proper two-way open-source
 development process (taking and then giving) I think it can
 become an important part of open-IR technologies, just like
 what Lucene did to the search engines world. What ORP has to
 offer is of great interest to HebMorph, an open-source
 project of mine trying to decide on what is the best way to
 index and search Hebrew texts.
 
 To this end I decided to put some of the development
 efforts of the HebMorph project into making tools for the
 ORP. I have announced this before, but unfortunately I had
 to attend to more pressing tasks before I could complete
 this (and there was no response from the community
 anyway...). Just in case you're interested in seeing what I
 came up with so far: http://github.com/synhershko/Orev.
 
 IMHO, the ORP should stand by itself, and relate to
 Lucene/Solr only as its basis framework for these initial
 stages. Perhaps also try to attract more people who could
 find an interest in what it has to offer, so it can really
 start growing.
 
 Itamar.
 
 On 12/9/2010 1:29 PM, Grant Ingersoll wrote:
 On Sep 11, 2010, at 8:51 PM, Robert Muir wrote:
 
 
 i propose we take what we have and import into
 lucene-java's benchmark
 contrib.  it already has integration with
 wikipedia and reuters for perf
 purposes, and the quality package is actually
 there anyways.  later, maybe
 more people have time and contrib/benchmark
 evolves naturally... e.g. to
 modules/benchmark with solr support as a first big
 step.
 
 Yeah, that seems reasonable.  I have been
 thinking lately that it might be useful to pull our DocMaker
 stuff out separately from benchmark so that people have easy
 ways of generating content from things like Wikipedia, etc.
 
 Still, at the end of the day, I like what ORP _could_
 bring to the table and to some extent I think that is lost
 by folding it into Lucene benchmark.
 
 
 On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org
 wrote:
 
 Seems ORP isn't really catching on with
 people. I know personally I don't
 
 have the time I had hoped to have to get it going.
 At the same time, I
 really think it could be a good project. We've got
 some tools put together,
 but we still haven't done much about the bigger
 goal of a self contained
 evaluation.
 
 Any thoughts on how we should proceed with
 ORP?
 
 -Grant
 
 
 
 
 
 
 
 

--
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8



[jira] Commented: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars

2010-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909253#action_12909253
 ] 

Robert Muir commented on LUCENE-2643:
-

My vote would be to drop it if we arent using it, its @lucene.internal.

since its unused, its not obvious that its wrong (its correct if you want the 
first code unit difference)


 StringHelper#stringDifference is wrong about supplementary chars 
 -

 Key: LUCENE-2643
 URL: https://issues.apache.org/jira/browse/LUCENE-2643
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0, 3.0.1, 3.0.2
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Trivial
 Fix For: 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2643.patch


 StringHelper#stringDifference does not take supplementary characters into 
 account. Since this is not used internally at all we should think about 
 removing it but I guess since it is not too complex we should just or fix it 
 for bwcompat reasons. For released versions we should really fix it since 
 folks might use it though. For trunk we could just drop it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars

2010-09-14 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2643:


Attachment: LUCENE-2643.patch

here is a patch

 StringHelper#stringDifference is wrong about supplementary chars 
 -

 Key: LUCENE-2643
 URL: https://issues.apache.org/jira/browse/LUCENE-2643
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0, 3.0.1, 3.0.2
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Trivial
 Fix For: 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2643.patch


 StringHelper#stringDifference does not take supplementary characters into 
 account. Since this is not used internally at all we should think about 
 removing it but I guess since it is not too complex we should just or fix it 
 for bwcompat reasons. For released versions we should really fix it since 
 folks might use it though. For trunk we could just drop it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars

2010-09-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909256#action_12909256
 ] 

Simon Willnauer commented on LUCENE-2643:
-

bq. since its unused, its not obvious that its wrong (its correct if you want 
the first code unit difference)

yeah - my interpretation would be its wrong since you use Character.charAt(int) 
with the index of the first code unit. anyway - we should drop for trunk but I 
am not sure if we should for 3.x. I mean this is not that much of a deal anyway.

 StringHelper#stringDifference is wrong about supplementary chars 
 -

 Key: LUCENE-2643
 URL: https://issues.apache.org/jira/browse/LUCENE-2643
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0, 3.0.1, 3.0.2
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Trivial
 Fix For: 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2643.patch


 StringHelper#stringDifference does not take supplementary characters into 
 account. Since this is not used internally at all we should think about 
 removing it but I guess since it is not too complex we should just or fix it 
 for bwcompat reasons. For released versions we should really fix it since 
 folks might use it though. For trunk we could just drop it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars

2010-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909260#action_12909260
 ] 

Robert Muir commented on LUCENE-2643:
-

drop in trunk and mark deprecated in 3.x?

regardless of whether its right or wrong, if we arent using it, i think its 
good to clean house.


 StringHelper#stringDifference is wrong about supplementary chars 
 -

 Key: LUCENE-2643
 URL: https://issues.apache.org/jira/browse/LUCENE-2643
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0, 3.0.1, 3.0.2
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Trivial
 Fix For: 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2643.patch


 StringHelper#stringDifference does not take supplementary characters into 
 account. Since this is not used internally at all we should think about 
 removing it but I guess since it is not too complex we should just or fix it 
 for bwcompat reasons. For released versions we should really fix it since 
 folks might use it though. For trunk we could just drop it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Whither ORP?

2010-09-14 Thread Dan Cardin
Hello,

This is a great start! I am interested in helping with the development of a
crowd sourcing application. The next step would be creating a set of
requirements for the web app. Would the ORP wiki be a good place to store
the requirements?

--Dan


On Tue, Sep 14, 2010 at 9:51 AM, Grant Ingersoll gsing...@apache.orgwrote:

 I think the biggest hurdle we have in front of us is curating a data set
 that we can redistribute.  I'm in the process of uploading all the ASF
 public mail archives as of Sept. 13 to Amazon S3.  I also have some tools
 (thanks to Chris Rhodes) for processing this into Solr XML.  I think this
 would give us a standard corpus to start with and would fairly well mimic
 some enterprise search/eDiscovery tasks pretty well.

 At any rate, as with any community, the proof is in people stepping up to
 help out.  I like that so many people suggested we keep going.  As for what
 to do, I think the options are pretty wide open and there is opportunity for
 people to define the project w/o any previous encumbrances.

 Some ideas that have been kicked around in the past:
 1. Creative-commons data set, judgments, queries
 2. Open Street Map (spatial search)
 3. Mail archives
 4. A crowd sourcing application.  Given a set of documents and queries,
 have people provide judgments.  Ideally, this runs in a web container and we
 could probably even find resources to host it here.  Combining that with one
 of the items above, we would be on our way.  App could also solicit queries
 by providing users open search box and opportunities to browse the data.

 I know much of this is simplistic, but it is a start.

 -Grant


 On Sep 13, 2010, at 9:04 PM, Dan Cardin wrote:

  Hello,
 
  I am new to ORP. I would like to contribute to the project. I do not have
 a
  lot of experience in this field of IR, crowd sourcing or AI. If someone
  could take the lead and set forward path I would be willing to contribute
 my
  skill set to ORP.
 
  How can I help? I have a lot of experience doing software development and
  system administration.
 
  Cheers,
  --Dan
 
  On Mon, Sep 13, 2010 at 1:36 PM, Omar Alonso oralo...@yahoo.com wrote:
 
  I think ORP is a great candidate for crowdsourcing/human computation. In
  the last year or so there's been quite a bit of research and
 applications on
  this. See the page for the SIGIR workshop on using crowdsourcing for IR
  evaluation: 
  http://www.ischool.utexas.edu/~cse2010/http://www.ischool.utexas.edu/%7Ecse2010/
 http://www.ischool.utexas.edu/%7Ecse2010/
 
  Omar
 
  --- On Mon, 9/13/10, Itamar Syn-Hershko ita...@code972.com wrote:
 
  From: Itamar Syn-Hershko ita...@code972.com
  Subject: Re: Whither ORP?
  To: openrelevance-...@lucene.apache.org
  Date: Monday, September 13, 2010, 9:33 AM
  With the proper two-way open-source
  development process (taking and then giving) I think it can
  become an important part of open-IR technologies, just like
  what Lucene did to the search engines world. What ORP has to
  offer is of great interest to HebMorph, an open-source
  project of mine trying to decide on what is the best way to
  index and search Hebrew texts.
 
  To this end I decided to put some of the development
  efforts of the HebMorph project into making tools for the
  ORP. I have announced this before, but unfortunately I had
  to attend to more pressing tasks before I could complete
  this (and there was no response from the community
  anyway...). Just in case you're interested in seeing what I
  came up with so far: http://github.com/synhershko/Orev.
 
  IMHO, the ORP should stand by itself, and relate to
  Lucene/Solr only as its basis framework for these initial
  stages. Perhaps also try to attract more people who could
  find an interest in what it has to offer, so it can really
  start growing.
 
  Itamar.
 
  On 12/9/2010 1:29 PM, Grant Ingersoll wrote:
  On Sep 11, 2010, at 8:51 PM, Robert Muir wrote:
 
 
  i propose we take what we have and import into
  lucene-java's benchmark
  contrib.  it already has integration with
  wikipedia and reuters for perf
  purposes, and the quality package is actually
  there anyways.  later, maybe
  more people have time and contrib/benchmark
  evolves naturally... e.g. to
  modules/benchmark with solr support as a first big
  step.
 
  Yeah, that seems reasonable.  I have been
  thinking lately that it might be useful to pull our DocMaker
  stuff out separately from benchmark so that people have easy
  ways of generating content from things like Wikipedia, etc.
 
  Still, at the end of the day, I like what ORP _could_
  bring to the table and to some extent I think that is lost
  by folding it into Lucene benchmark.
 
 
  On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org
  wrote:
 
  Seems ORP isn't really catching on with
  people. I know personally I don't
 
  have the time I had hoped to have to get it going.
  At the same time, I
  really think it could be a good project. We've got
  some tools put 

Re: Whither ORP?

2010-09-14 Thread Robert Muir
On Tue, Sep 14, 2010 at 10:22 AM, Dan Cardin dcardin2...@gmail.com wrote:

 Hello,

 This is a great start! I am interested in helping with the development of a
 crowd sourcing application. The next step would be creating a set of
 requirements for the web app. Would the ORP wiki be a good place to store
 the requirements?


+1, don't hold back!

-- 
Robert Muir
rcm...@gmail.com


[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909272#action_12909272
 ] 

Yonik Seeley commented on LUCENE-2504:
--

Looks like we're not using the correct comparators everywhere.
I was trying a slightly different way to implement sort-missing-last, and my 
first comparator only implements setNextReader(), but I'm now getting many 
UnsupportedOperationExceptions (i.e. the search process is using older 
comparators after calling setNextReader())

One culprit is OneComparatorNonScoringCollector, and another is 
OneComparatorFieldValueHitQueue I think.


 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Whither ORP?

2010-09-14 Thread Simon Willnauer
On Tue, Sep 14, 2010 at 4:30 PM, Robert Muir rcm...@gmail.com wrote:
 On Tue, Sep 14, 2010 at 10:22 AM, Dan Cardin dcardin2...@gmail.com wrote:

 Hello,

 This is a great start! I am interested in helping with the development of a
 crowd sourcing application. The next step would be creating a set of
 requirements for the web app. Would the ORP wiki be a good place to store
 the requirements?


 +1, don't hold back!

+1 - we need some action here! go for it!

 --
 Robert Muir
 rcm...@gmail.com



[jira] Updated: (LUCENE-2630) make the build more friendly to apache harmony

2010-09-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2630:


Attachment: LUCENE-2630.patch

The harmony developers applied the UTF-8 fix (HARMONY-6640), so
we don't need to hack MockTokenizer anymore.

i've updated patch, 'ant test-core -Dbuild.compiler=extJavac' almost passes.

i'll iterate with some more test improvements now that we are going somewhere.


 make the build more friendly to apache harmony
 --

 Key: LUCENE-2630
 URL: https://issues.apache.org/jira/browse/LUCENE-2630
 Project: Lucene - Java
  Issue Type: Task
  Components: Build, Tests
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-2630.patch, LUCENE-2630.patch


 as part of improved testing, i thought it would be a good idea to make the 
 build (ant test) more friendly
 to working under apache harmony.
 i'm not suggesting we de-optimize code for sun jvms or anything crazy like 
 that, only use it as a tool.
 for example:
 * bugs in tests/code: for example i found a test that expected ArrayIOOBE 
   when really the javadoc contract for the method is just IOOBE... it just 
 happens to
   pass always on sun jvm because thats the implementation it always throws.
 * better reproduction of bugs: for example [2 months out of the 
 year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug]
   it seems TestQueryParser fails with thai locale in a difficult-to-reproduce 
 way.
   but i *always* get similar failures like this with harmony for this test 
 class.
 * better stability and portability: we should try (if reasonable) to avoid 
 depending
   upon internal details. the same kinds of things that fail in harmony might 
 suddenly
   fail in a future sun jdk. because its such a different impl, it brings out 
 a lot of interesting stuff.
 at the moment there are currently a lot of failures, I think a lot might be 
 caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction

2010-09-14 Thread Grant Ingersoll

On Sep 13, 2010, at 1:59 PM, Lance Norskog wrote:

 What I want you to do is, I want you to find the guys who are putting
 all the bugs in the code, and I want you to FIRE THEM!

He who is without bugs in their code, may be the first to fire.

-Grant

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2630) make the build more friendly to apache harmony

2010-09-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2630:


Attachment: LUCENE-2630_charutils.patch

 make the build more friendly to apache harmony
 --

 Key: LUCENE-2630
 URL: https://issues.apache.org/jira/browse/LUCENE-2630
 Project: Lucene - Java
  Issue Type: Task
  Components: Build, Tests
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, 
 LUCENE-2630_charutils.patch


 as part of improved testing, i thought it would be a good idea to make the 
 build (ant test) more friendly
 to working under apache harmony.
 i'm not suggesting we de-optimize code for sun jvms or anything crazy like 
 that, only use it as a tool.
 for example:
 * bugs in tests/code: for example i found a test that expected ArrayIOOBE 
   when really the javadoc contract for the method is just IOOBE... it just 
 happens to
   pass always on sun jvm because thats the implementation it always throws.
 * better reproduction of bugs: for example [2 months out of the 
 year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug]
   it seems TestQueryParser fails with thai locale in a difficult-to-reproduce 
 way.
   but i *always* get similar failures like this with harmony for this test 
 class.
 * better stability and portability: we should try (if reasonable) to avoid 
 depending
   upon internal details. the same kinds of things that fail in harmony might 
 suddenly
   fail in a future sun jdk. because its such a different impl, it brings out 
 a lot of interesting stuff.
 at the moment there are currently a lot of failures, I think a lot might be 
 caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909312#action_12909312
 ] 

Michael McCandless commented on LUCENE-2504:


bq. I'm now getting many UnsupportedOperationExceptions (i.e. the search 
process is using older comparators after calling setNextReader())

That's no good!

bq. One culprit is OneComparatorNonScoringCollector, and another is 
OneComparatorFieldValueHitQueue I think.

Hmm I don't see the problem -- eg OneComparatorNonScoringCollector saves the 
returned comparator from comparator.setNextReader.

Can you post the full exc?

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2504:
-

Attachment: LUCENE-2504.patch

Attaching a draft patch that seems to fix the issue (the ones I can find at 
least).

bq. Hmm I don't see the problem - eg OneComparatorNonScoringCollector saves the 
returned comparator from comparator.setNextReader.

Yes, but FieldValueHitQueue has it's own list of comparators that never get 
updated.

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909337#action_12909337
 ] 

Michael McCandless commented on LUCENE-2504:


{quote}
I think we all owe it to ourselves to stop equating java with Oracle, if Java 
stays with Oracle its pretty obvious the language (is) will die anyway.
{quote}

Yeah I agree.

The open question is whether this hotspot fickleness is particular to
Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET
included).  It's really a hard, complex problem (JIT compilation from
bytecode based on runtime data), so it wouldn't surprise me if it's
the latter, to varying degrees.

bq. .NET is not a choice but generating C/C++ code is?

As far as I know it's much easier to invoke C/C++ from java, than .NET
from java.  C/C++ is also more portable than .NET, I think?  (There is
Mono -- how mature is it by now?).

{quote}
I don't think we should jump the gun and make real design/architectural
choices based on Oracle bugs.
{quote}

I expect source code spec will also buy sizable perf gains
irrespective of hotspot fickleness, and in non-Oracle java impls.
Generating a dedicated class, with one method doing all searching and
collecting, removes all kinds of barriers to the JIT compiler.  It
makes its job far easier.

bq. I agree with robert that we should stop comparing against sun jvms all the 
time and turn everything upside-down specializing code here and there or go one 
step further and generate C++ code. Dude who is gonna maintain the 
compatibility to Java-Only environments?

If we manage to pursue specialized code gen, it'll be a lng time
coming!  My point about C/C++ is that if we do somehow manage to get a
working code gen framework online (for Java), the added cost to make
it also target C/C++ will be relatively small.  Ie, it's nearly for
free.

If we were do to this, that would not mean we'd abandon java, of
course -- the framework would fully support pure java as well.

bq. I think that code specializations of very hot part of lucene are ok and 
we should follow that way like we did at some places but it already make things 
very complicated to follow. 

You mean manual specialization right (like this issue)?

Yes, I think we will have to keep manually specializing, going
forward, until we can have code generator that
does it more cleanly...

bq. Would it make way more sense to push OSS JVMs than spending lots of time on 
investigating on .NET as an alternative or C/C++ code generator?

I think we should do both.

bq. Before I would go the C++ path I'd rather use Java to host a C core like 
lucy which brings you as close as it gets to the machine.

I think this (a Java wrapper for Lucy) is a great idea -- we should explore 
that, too.

bq. interesting papers - seems we are touching the limits of Java though.

Well that's the big question -- limits of Java or limit's of Sun/Oracle's impl.

It looks like harmony has a ways to go on absolute performance: I just
ran a very quick benchmark (TermQuery search on 10 M multi-segment
wiki index w/ a 50% random filter) and Oracle java 1.6.0_21 gets 15.6
QPS while Harmony 1.5.0-r946978 gets 9.5 QPS (Harmony 1.6.0-r946981
also gets 9.5 QPS).  I just ran java -server -Xms2g -Xmx2g; it's
possible by tuning Harmony (it has many awesome looking command-line
args!) it'd get faster...


 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-2504:



bq. Yes, but FieldValueHitQueue has it's own list of comparators that never get 
updated.

Ugh, yes.

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-14 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2575:
-

Attachment: LUCENE-2575.patch

Term frequency is recorded and returned.  There are Terms, TermsEnum, DocsEnum 
implementations.  Needs the term vectors, doc stores exposed via the RAM 
reader, concurrency unit tests, and a payload unit test.  Still quite rough.

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Whither ORP?

2010-09-14 Thread Itamar Syn-Hershko

On 14/9/2010 4:22 PM, Dan Cardin wrote:

Hello,

This is a great start! I am interested in helping with the development of a
crowd sourcing application. The next step would be creating a set of
requirements for the web app. Would the ORP wiki be a good place to store
the requirements?

--Dan


Uhm... this is actually what I just said I'm in the middle of doing. But 
perhaps doing some spec'ing through the Wiki would end in a better 
product, so why not.


Please see 
http://search-lucene.com/m/pLgxg1HCef11subj=OpenRelevance+Viewer+Orev+ 
http://search-lucene.com/m/pLgxg1HCef11subj=OpenRelevance+Viewer+Orev+ to 
get an idea of what I did there. Let's branch the discussion from there 
to get this going in the right direction...


As I wrote in the other message, this app can be accessed through 
http://github.com/synhershko/Orev (.NET / C# / NHibernate), and there's 
still some to do there.


Itamar.


Re: Whither ORP?

2010-09-14 Thread Itamar Syn-Hershko

On 14/9/2010 3:44 PM, Grant Ingersoll wrote:

If you can, putting them up as a patch would be useful.  That way, we can show 
some progress.


I will, but first it needs to be workable. It is 80% done, but still not 
that usable. I expect to be able to work on it again in a month or so. 
Or someone else could resume from where I stopped (in .NET, or port it 
to Java). I'm can share what is missing if anyone is interested.


Itamar.


[jira] Commented: (LUCENE-2630) make the build more friendly to apache harmony

2010-09-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909347#action_12909347
 ] 

Simon Willnauer commented on LUCENE-2630:
-

bq. Here's the patch for TestCharacterUtils.
looks good to me! go commit!

 make the build more friendly to apache harmony
 --

 Key: LUCENE-2630
 URL: https://issues.apache.org/jira/browse/LUCENE-2630
 Project: Lucene - Java
  Issue Type: Task
  Components: Build, Tests
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, 
 LUCENE-2630_charutils.patch


 as part of improved testing, i thought it would be a good idea to make the 
 build (ant test) more friendly
 to working under apache harmony.
 i'm not suggesting we de-optimize code for sun jvms or anything crazy like 
 that, only use it as a tool.
 for example:
 * bugs in tests/code: for example i found a test that expected ArrayIOOBE 
   when really the javadoc contract for the method is just IOOBE... it just 
 happens to
   pass always on sun jvm because thats the implementation it always throws.
 * better reproduction of bugs: for example [2 months out of the 
 year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug]
   it seems TestQueryParser fails with thai locale in a difficult-to-reproduce 
 way.
   but i *always* get similar failures like this with harmony for this test 
 class.
 * better stability and portability: we should try (if reasonable) to avoid 
 depending
   upon internal details. the same kinds of things that fail in harmony might 
 suddenly
   fail in a future sun jdk. because its such a different impl, it brings out 
 a lot of interesting stuff.
 at the moment there are currently a lot of failures, I think a lot might be 
 caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Whither ORP?

2010-09-14 Thread Dan Cardin
Hello,

I will begin documenting some basic requirements for a crowd sourcing web
app. I will use some of the work done by Itamar as a basis for the
requirements.

--Dan

On Tue, Sep 14, 2010 at 1:18 PM, Itamar Syn-Hershko ita...@code972.comwrote:

 On 14/9/2010 3:44 PM, Grant Ingersoll wrote:

 If you can, putting them up as a patch would be useful.  That way, we can
 show some progress.


 I will, but first it needs to be workable. It is 80% done, but still not
 that usable. I expect to be able to work on it again in a month or so. Or
 someone else could resume from where I stopped (in .NET, or port it to
 Java). I'm can share what is missing if anyone is interested.

 Itamar.



[jira] Updated: (LUCENE-2630) make the build more friendly to apache harmony

2010-09-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2630:


Attachment: LUCENE-2630_intl.patch

here's a patch for the internationalization differences, since harmony uses ICU.
* the collator gives different order for Locale.US, though 
its wierd we test the order of non-US characters under US Locale (its not 
defined and inherited from root locale)
I conditionalized this test as such:
{code}
  // the sort order of Ø versus U depends on the version of the rules being used
  // for the inherited root locale: Ø's order isnt specified in Locale.US since 
  // its not used in english.
  private boolean oStrokeFirst = Collator.getInstance(new 
Locale()).compare(Ø, U)  0;
{code}
* the thai dictionary-based break iterator gives different results: I used text 
that both impls segment the same way.


 make the build more friendly to apache harmony
 --

 Key: LUCENE-2630
 URL: https://issues.apache.org/jira/browse/LUCENE-2630
 Project: Lucene - Java
  Issue Type: Task
  Components: Build, Tests
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, 
 LUCENE-2630_charutils.patch, LUCENE-2630_intl.patch


 as part of improved testing, i thought it would be a good idea to make the 
 build (ant test) more friendly
 to working under apache harmony.
 i'm not suggesting we de-optimize code for sun jvms or anything crazy like 
 that, only use it as a tool.
 for example:
 * bugs in tests/code: for example i found a test that expected ArrayIOOBE 
   when really the javadoc contract for the method is just IOOBE... it just 
 happens to
   pass always on sun jvm because thats the implementation it always throws.
 * better reproduction of bugs: for example [2 months out of the 
 year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug]
   it seems TestQueryParser fails with thai locale in a difficult-to-reproduce 
 way.
   but i *always* get similar failures like this with harmony for this test 
 class.
 * better stability and portability: we should try (if reasonable) to avoid 
 depending
   upon internal details. the same kinds of things that fail in harmony might 
 suddenly
   fail in a future sun jdk. because its such a different impl, it brings out 
 a lot of interesting stuff.
 at the moment there are currently a lot of failures, I think a lot might be 
 caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909407#action_12909407
 ] 

Yonik Seeley commented on LUCENE-2504:
--

bq. The open question is whether this hotspot fickleness is particular to 
Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included).

I tried IBM's latest Java6 (SR8 FP1, 20100624)
It seems to have some of the same pitfalls as Oracle's JVM, just different.
The first run does not differ from the second run in the same JVM as it does 
with Oracle, but the first run itself has much more variation.  The worst case 
is worse, and just like the Oracle JVM, it gets stuck in it's worst case.

Each run (of the complete set of fields) in a separate JVM since two runs in 
the same JVM didn't really differ as they did in the oracle JVM.


branch_3x:
|unique terms in field|median sort time of 100 sorts in ms|another run|another 
run|another run|another run|another run|another run
|10|129|128|130|109|98|128|135
|1|128|123|127|127|98|128|135
|1000|129|130|130|128|98|130|136
|100|128|133|133|130|100|132|139
|10|150|153|153|154|122|153|159

trunk:
|unique terms in field|median sort time of 100 sorts in ms|another run|another 
run|another run|another run|another run|another run
|10|217|81|383|99|79|78|215
|1|254|73|346|101|106|108|267
|1000|253|74|347|99|107|108|258
|100|253|107|394|98|107|102|255
|10|251|107|388|99|106|98|257

The second way of testing is to completely mix fields (no serial correlation 
between what field is sorted on).  This is the test that is very predictable 
with the Oracle JVM, but I still see wide variability with the IBM JVM.  Here 
is the list of different runs for the Oracle JVM (ms):

branch_3x
|128|129|123|120|128|100|95|74|130|91|120

trunk
|106|89|168|116|155|119|108|118|112|169|165

To my eye, it looks like we have more variability in trunk, due to increased 
use of abstractions?


 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909456#action_12909456
 ] 

Yonik Seeley commented on LUCENE-2504:
--

OK, I've committed the fix to always use the latest generation field comparator.
Not sure if this is the best way to handle - but at least it's correct now and 
we can improve more later.

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2120) Facet Field Value truncation

2010-09-14 Thread Niall O'Connor (JIRA)
Facet Field Value truncation


 Key: SOLR-2120
 URL: https://issues.apache.org/jira/browse/SOLR-2120
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4.1
Reporter: Niall O'Connor


There is a limit on the length of indexed string values of 256 characters, this 
results in undesirable behavior for facet field values for example: 

lst name=facet_fields
lst name=pub_articletitle
int name=12302/int
int name=hiv1403/int
int name=type1382/int
/lst
lst name=tissue-antology
int name=Lymph node,Organ component,Cardinal organ part,Anatomical 
structure,Material anatomical entity,Physical anatomical entity,Anatomical 
entity419/int
int name=Left frontal lobe,Frontal lobe,Lobe of cerebral hemisphere,Segment 
of cerebral hemisphere,Segment of telencephalon,Segment of forebrain,Segment of 
brain,Segment of neuraxis,Organ segment,Organ region,Cardinal organ 
part,Anatomical structure,*Material anatom236/int
int name=ical entity,Physical anatomical entity,Anatomical entity236/int*
/lst  

The last facet in the list is being truncated and spills into a new facet. This 
also eats up a facet since i usually only return the top 3. 

Is 256 characters a hard limit in the indexing strategy?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2644) LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its Name

2010-09-14 Thread Scott Gonyea (JIRA)
LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its 
Name
--

 Key: LUCENE-2644
 URL: https://issues.apache.org/jira/browse/LUCENE-2644
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Scott Gonyea
 Fix For: 3.0.3, 3.1, Realtime Branch, 4.0


While I understand some of the reasons for its design, the original 
LowerCaseTokenizer should have been named LowerCaseLetterTokenizer.

I feel that LowerCaseTokenizer makes too many assumptions about what too 
tokenize, and I have therefore patched it.  The *default* behavior will remain 
as it always has--to avoid breaking any implementations for which it's being 
used.

I have changed LowerCaseTokenizer to extend CharTokenizer (rather than 
LetterTokenizer).  LetterTokenizer's functionality was merged into the default 
behavior of LowerCaseTokenizer.

Getter/Setter methods have been added to the LowerCaseTokenizer Class, allowing 
you to turn on / off tokenizing by white space, numbers, and special 
(Non-Alpha/Numeric) characters.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2644) LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its Name

2010-09-14 Thread Scott Gonyea (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Gonyea updated LUCENE-2644:
-

Attachment: LowerCaseTokenizer.patch

This patch will retain original functionality, while permitting the user to 
modify the assumptions on which tokens are built.

 LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its 
 Name
 --

 Key: LUCENE-2644
 URL: https://issues.apache.org/jira/browse/LUCENE-2644
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0.2
Reporter: Scott Gonyea
 Fix For: 3.0.3, 3.1, Realtime Branch, 4.0

 Attachments: LowerCaseTokenizer.patch


 While I understand some of the reasons for its design, the original 
 LowerCaseTokenizer should have been named LowerCaseLetterTokenizer.
 I feel that LowerCaseTokenizer makes too many assumptions about what too 
 tokenize, and I have therefore patched it.  The *default* behavior will 
 remain as it always has--to avoid breaking any implementations for which it's 
 being used.
 I have changed LowerCaseTokenizer to extend CharTokenizer (rather than 
 LetterTokenizer).  LetterTokenizer's functionality was merged into the 
 default behavior of LowerCaseTokenizer.
 Getter/Setter methods have been added to the LowerCaseTokenizer Class, 
 allowing you to turn on / off tokenizing by white space, numbers, and special 
 (Non-Alpha/Numeric) characters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2504) sorting performance regression

2010-09-14 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2504:
-

Attachment: LUCENE-2504_SortMissingLast.patch

This was a simple attempt to try and simplify the comparators.   Static classes 
are used instead of inner classes.  Unfortunately, it didn't help the JVMs from 
getting stuck in badly optimized code (it was a long shot for that), but it 
does result in a consistent 4% speedup.

It looks as simple as the previous version to my eye, so I'll commit if there 
are no objections.


 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-14 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2575:
-

Attachment: LUCENE-2575.patch

Added a unit test for payloads, term vectors, and doc stores.  The reader 
flushes term vectors and doc stores on demand, once per reader.  Also, little 
things are getting cleaned up in the realtime branch.

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
 LUCENE-2575.patch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1194) Query Analyzer not Invoking for Custom FiledType - When we use Custom QParser Plugin

2010-09-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1194.


Resolution: Invalid

This sounds like a bug in your custom QParser -- the QParser is what calls the 
analyzer and constructs the query.

w/o any information as to how FPersonQParserPlugin is implemented, there 
doesn't seem to be a bug here.

If your issue is that you have questions about how to implement 
.FPersonQParserPlugin properly so thta it uses the field's analyzer, please 
post that as a question to the solr-user mailing list

 Query Analyzer not Invoking for Custom FiledType - When we use Custom QParser 
 Plugin
 

 Key: SOLR-1194
 URL: https://issues.apache.org/jira/browse/SOLR-1194
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Windows, Java 1.6. Solr 1.3
Reporter: Nagarajan.shanmugam
   Original Estimate: 2h
  Remaining Estimate: 2h

 Hi I  Created Custom Solr Field kwd_names in
 schema.xml
 fieldType name=kwd_names class=solr.TextField positionIncrementGap=100
   analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory 
 /
   filter class=solr.TrimFilterFactory /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=Metaphone inject=true/  
   /analyzer
   analyzer type=index
   tokenizer class=solr.KeywordTokenizerFactory 
 /
   filter class=solr.TrimFilterFactory /   
 
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.PhoneticFilterFactory 
 encoder=Metaphone inject=true/  
   /analyzer 
   /fieldType
 I configured requestHandler in solrConfig.xml with Custom QparserPlugin
 requestHandler name=fperson class=solr.SearchHandler
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
str name=defTypefpersonQueryParser/str
  /lst
  /requestHandler
 queryParser name=fpersonQueryParser 
   
 class=com.thinkronize.edudym.search.analysis.FPersonQParserPlugin /
   SolrQuery q = new SolrQuery();
   q.setParam(q, George);
   q.setParam(gender, M);
   q.setQueryType(FPersonSearcher.QUERY_TYPE);
   server.query(q);
 When I fire Query it wont invoke the QueryAnlayzer it Doesnt give any result. 
 But if i remove q.setQueryType its invoking the query analyzer and its giving 
 results 
 That mean QueryAnalyzer for that field not invoked when i use CustomQParser 
 Plugin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2119) IndexSchema should log warning if analyzer is declared with charfilter/tokenizer/tokenfiler out of order

2010-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909511#action_12909511
 ] 

Robert Muir commented on SOLR-2119:
---

{quote}
There seems to be a segment of hte user population that has a hard time 
understanding the distinction between a charfilter, a tokenizer, and a 
tokenfilter - while we can certianly try to improve the documentation about 
what exactly each does, and when they take affect in the analysis chain, one 
other thing we should do is try to educate people when they constuct their 
analyzer in a way that doesn't make any sense.
{quote}

I think we should do both, this is a great idea.

{quote}
(we could easily make such a situation fail to initialize, but i'm not 
convinced that would be the best course of action, since some people may have 
schema's where they have declared a charFilter or tokenizer out of order 
relative to their tokenFilters, but are still getting correct results that 
work for them, and breaking their instance on upgrade doens't seem like it 
would be productive)
{quote}

I would prefer a hard error. I think someone who doesnt understand what 
tokenizers and filters do, likely isnt looking at their log files either.

In my opinion, Solr should be more picky about its configuration. Often times 
if i havent had enough sleep i will type tokenFilter instead of filter, and it 
simply ignores it completely, instead of an error.

and i can't be the only one that does this, its not obvious that tokenizer = 
Tokenizer, charFilter = CharFilter, analyzer = Analyzer, but filter = 
TokenFilter.


 IndexSchema should log warning if analyzer is declared with 
 charfilter/tokenizer/tokenfiler out of order
 --

 Key: SOLR-2119
 URL: https://issues.apache.org/jira/browse/SOLR-2119
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Hoss Man

 There seems to be a segment of hte user population that has a hard time 
 understanding the distinction between a charfilter, a tokenizer, and a 
 tokenfilter -- while we can certianly try to improve the documentation about 
 what exactly each does, and when they take affect in the analysis chain, one 
 other thing we should do is try to educate people when they constuct their 
 analyzer in a way that doesn't make any sense.
 at the moment, some people are attempting to do things like move the Foo 
 tokenFilter/ before the tokenizer/ to try and get certain behavior ... 
 at a minimum we should log a warning in this case that doing that doesn't 
 have the desired effect
 (we could easily make such a situation fail to initialize, but i'm not 
 convinced that would be the best course of action, since some people may have 
 schema's where they have declared a charFilter or tokenizer out of order 
 relative to their tokenFilters, but are still getting correct results that 
 work for them, and breaking their instance on upgrade doens't seem like it 
 would be productive)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2121) distributed highlighting using q.alt=*:* causes NPE in finishStages

2010-09-14 Thread Hoss Man (JIRA)
distributed highlighting using q.alt=*:* causes NPE in finishStages
---

 Key: SOLR-2121
 URL: https://issues.apache.org/jira/browse/SOLR-2121
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


As noted on the mailing list by Ron Mayer,  using the example configs and 
example data on trunk, this query works...

http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismax

...but this query causes and NullPointerException...

http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismaxshards=localhost:8983/solr

Stack Trace...

{noformat}
java.lang.NullPointerException
at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:158)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:310)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)

{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2121) distributed highlighting using q.alt=*:* causes NPE in finishStages

2010-09-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909514#action_12909514
 ] 

Hoss Man commented on SOLR-2121:


Marc Sturlese posted his fix but it's not entirely obvious to me what exactly 
the necessary change is, or if the root cause isn't somewhere else...

{code}
  public void finishStage(ResponseBuilder rb) {
boolean hasHighlighting = true ;
if (rb.doHighlights  rb.stage == ResponseBuilder.STAGE_GET_FIELDS) {

  Map.EntryString, Object[] arr = new
NamedList.NamedListEntry[rb.resultIds.size()];

  // TODO: make a generic routine to do automatic merging of id keyed
data
  for (ShardRequest sreq : rb.finished) {
if ((sreq.purpose  ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0)
continue;
for (ShardResponse srsp : sreq.responses) {
  NamedList hl =
(NamedList)srsp.getSolrResponse().getResponse().get(highlighting);
  //patch bug
  if(hl != null) {
for (int i=0; ihl.size(); i++) {
 String id = hl.getName(i);
 ShardDoc sdoc = rb.resultIds.get(id);
 int idx = sdoc.positionInResponse;
 arr[idx] = new NamedList.NamedListEntry(id, hl.getVal(i));
}
  } else {
hasHighlighting = false;
  }
}
  }

  // remove nulls in case not all docs were able to be retrieved
  //patch bug
  if(hasHighlighting) {
rb.rsp.add(highlighting, removeNulls(new SimpleOrderedMap(arr)));
  }
}
  }

{code}

 distributed highlighting using q.alt=*:* causes NPE in finishStages
 ---

 Key: SOLR-2121
 URL: https://issues.apache.org/jira/browse/SOLR-2121
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 As noted on the mailing list by Ron Mayer,  using the example configs and 
 example data on trunk, this query works...
 http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismax
 ...but this query causes and NullPointerException...
 http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismaxshards=localhost:8983/solr
 Stack Trace...
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:158)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:310)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Current trunk example woes...

2010-09-14 Thread Erick Erickson
If I check out the current trunk, and from solr do an ant clean example
all is well, even up to starting Solr. But trying to hit anything on the
site gives a response in the browser starting with:

org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Error loading class 'solr.SpatialTileField'

Commenting the relevant fieldType out of schema.xml fixes this.
Should I open a Jira or does someone want to jump on it?

Erick


Obsolete instructions for Velocity ResponseWriter on the Wiki

2010-09-14 Thread Erick Erickson
For trunk, the instructions here:
http://wiki.apache.org/solr/VelocityResponseWriter about starting up
VRW/Solaritas are obsolete I think. It looks like all this has been folded
into core. I'll go up and add some notes for trunk/1.5 unless someone
objects.

Erick


Build failed in Hudson: Lucene-3.x #115

2010-09-14 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Lucene-3.x/115/changes

Changes:

[rmuir] LUCENE-2630: look for the correct exception according to javadoc 
contract

[gsingers] SOLR-1568: move DistanceUtils up a package

[gsingers] SOLR-1568: backport to 3.x

[rmuir] LUCENE-2630: allow lucene to be built with non-sun jvms

[rmuir] missing merge props for r996720

[rmuir] quiet this test

[rmuir] LUCENE-2642: merge Uwe's test improvements

[rmuir] LUCENE-2642: merge LuceneTestCase and LuceneTestCaseJ4

[rmuir] add exception ignore for extraction test

[rmuir] SOLR-2118: fix setTermIndexDivisor param to have its correct name

--
[...truncated 18329 lines...]
[junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 2.863 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.018 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcard
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.036 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.251 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestDocValues
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.249 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.102 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.65 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.033 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestBasics
[junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 14.143 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestFieldMaskingSpanQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.663 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.093 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestPayloadSpans
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.56 sec
[junit] 
[junit] - Standard Output ---
[junit] 
[junit] Spans Dump --
[junit] payloads for span:2
[junit] doc:0 s:3 e:6 one:Entity:3
[junit] doc:0 s:3 e:6 three:Noise:5
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:0 s:0 e:3 rr:Noise:1
[junit] doc:0 s:0 e:3 yy:Noise:2
[junit] doc:0 s:0 e:3 xx:Entity:0
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:1 s:0 e:4 yy:Noise:1
[junit] doc:1 s:0 e:4 rr:Noise:3
[junit] doc:1 s:0 e:4 xx:Entity:0
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:0 s:0 e:3 rr:Noise:1
[junit] doc:0 s:0 e:3 yy:Noise:2
[junit] doc:0 s:0 e:3 xx:Entity:0
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:0 s:0 e:3 yy:Noise:2
[junit] doc:0 s:0 e:3 xx:Entity:0
[junit] doc:0 s:0 e:3 rr:Noise:1
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:1 s:0 e:4 rr:Noise:3
[junit] doc:1 s:0 e:4 xx:Entity:0
[junit] doc:1 s:0 e:4 yy:Noise:1
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:2 s:0 e:5 ss:Noise:2
[junit] doc:2 s:0 e:5 qq:Noise:1
[junit] doc:2 s:0 e:5 pp:Noise:3
[junit] 
[junit] Spans Dump --
[junit] payloads for span:8
[junit] doc:3 s:0 e:11 ten:Noise:9
[junit] doc:3 s:0 e:11 two:Noise:1
[junit] doc:3 s:0 e:11 six:Noise:5
[junit] doc:3 s:0 e:11 eleven:Noise:10
[junit] doc:3 s:0 e:11 five:Noise:4
[junit] doc:3 s:0 e:11 one:Entity:0
[junit] doc:3 s:0 e:11 three:Noise:2
[junit] doc:3 s:0 e:11 nine:Noise:8
[junit] 
[junit] Spans Dump --
[junit] payloads for span:8
[junit] doc:4 s:0 e:11 nine:Noise:0
[junit] doc:4 s:0 e:11 five:Noise:5
[junit] doc:4 s:0 e:11 eleven:Noise:9
[junit] doc:4 s:0 e:11 two:Noise:2
[junit] doc:4 s:0 e:11 one:Entity:1
[junit] doc:4 s:0 e:11 six:Noise:6
[junit] doc:4 s:0 e:11 ten:Noise:10

/trunk sortMissingLast=true status?

2010-09-14 Thread Ryan McKinley
Testing with r997128:

I have a field defined as:
fieldType name=bytes  class=solr.TrieLongField
sortMissingLast=true precisionStep=0 omitNorms=true
positionIncrementGap=0/

When I call ?sort=bytes desc,  everything works as expected, the
biggest thigns are first.  When I call ?sort=bytes asc, the entries
without a bytes field all go first.

I am sort of following the changes in LUCENE-2504, that point to
odities with sortMissignLast, but yoniks comments in #997095 suggest
this should be working.

Am i missing something?

Thanks
Ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Current trunk example woes...

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 8:16 PM, Erick Erickson erickerick...@gmail.com wrote:
 If I check out the current trunk, and from solr do an ant clean example
 all is well, even up to starting Solr. But trying to hit anything on the
 site gives a response in the browser starting with:

 org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
 fieldType:Error loading class 'solr.SpatialTileField'

 Commenting the relevant fieldType out of schema.xml fixes this. Should
 I open a Jira or does someone want to jump on it?

Hmmm, I can't reproduce this.
Something like http://localhost:8983/solr/select?q=solr seems to work fine.

Did you do an svn up at the trunk level (i.e. get lucene too)?

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: /trunk sortMissingLast=true status?

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 9:40 PM, Ryan McKinley ryan...@gmail.com wrote:
 Testing with r997128:

 I have a field defined as:
 fieldType name=bytes  class=solr.TrieLongField
 sortMissingLast=true precisionStep=0 omitNorms=true
 positionIncrementGap=0/

SortMissingLast/SortMissingFirst is currently only supported with
fields that internally use a StringIndex (now DocTermsIndex) because
that's the only FieldCache representation that records what fields are
missing (via an ord of 0).

There's a note in the example schema.xml:

!-- The optional sortMissingLast and sortMissingFirst attributes are
 currently supported on types that are sorted internally as strings.
   This includes
string,boolean,sint,slong,sfloat,sdouble,pdate

That's actually the only reason the sint type fields are still around.
 If we could distinguish between 0 and missing, we could
deprecate/remove the s* fields and always use trie fields.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl

2010-09-14 Thread David Smiley (JIRA)
False assertion of 0 position delta in StandardPostingsWriterImpl
--

 Key: LUCENE-2645
 URL: https://issues.apache.org/jira/browse/LUCENE-2645
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: David Smiley
Priority: Minor


StandardPostingsWriterImpl line 159 is:
{code:java}
assert delta  0 || position == 0 || position == -1: position= + position 
+  lastPosition= + lastPosition;// not quite right (if pos=0 is 
repeated twice we don't catch it)
{code}

I enable assertions when I run my unit tests and I've found this assertion to 
fail when delta is 0 which occurs when the same position value is sent in twice 
in arrow.  Once I added RemoveDuplicatesTokenFilter, this problem went away.  
Should I really be forced to add this filter?  I think delta = 0 would be a 
better assertion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2611) IntelliJ IDEA setup

2010-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909579#action_12909579
 ] 

Steven Rowe commented on LUCENE-2611:
-

Once Robert's latest patch on SOLR-2002 gets applied -- it moves around some of 
the Solr module structure -- the IntelliJ setup patches will need to be 
adjusted.

 IntelliJ IDEA setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
 The attached patch adds a new top level directory {{dev-tools/}} with sub-dir 
 {{idea/}} containing basic setup files for trunk, as well as a top-level ant 
 target named idea that copies these files into the proper locations.  This 
 arrangement avoids the messiness attendant to in-place project configuration 
 files directly checked into source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit test run per module is 
 included.
 Once {{ant idea}} has been run, the only configuration that must be performed 
 manually is configuring the project-level JDK.
 If this patch is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination module files (*.iml) in each 
 module's directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-14 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909580#action_12909580
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

For the posting skip list we need to implement seek on the
ByteSliceReader. However if we're rewriting a portion of a
slice, then I guess we could have a problem... Meaning we'd be
storing an absolute position in the skip list, and we could go
to look up the value, however that byte(s) could have been
altered to not be delta encoded doc ids anymore, but instead
is/are the forwarding address to the next slice. 

Do we need an intelligent mechanism that interacts with the byte
slice writer to not point at byte array elements (ie the end of
slices) that could later be converted into forwarding addresses?

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
 LUCENE-2575.patch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Solr-3.x #104

2010-09-14 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Solr-3.x/104/changes

Changes:

[rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version

[rmuir] LUCENE-2630: look for the correct exception according to javadoc 
contract

[gsingers] SOLR-1568: move DistanceUtils up a package

[gsingers] SOLR-1568: backport to 3.x

[rmuir] LUCENE-2630: allow lucene to be built with non-sun jvms

[rmuir] missing merge props for r996720

--
[...truncated 5476 lines...]

clover:

common.compile-core:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test

init:

clover.setup:

clover.info:

clover:

compile-core:
[mkdir] Created dir: 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/misc/classes/java
[javac] Compiling 11 source files to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/misc/classes/java
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile:
 [echo] Building queries...
Trying to override old definition of task m2-deploy
Trying to override old definition of task invoke-javadoc

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:

clover:

common.compile-core:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test

init:

clover.setup:

clover.info:

clover:

compile-core:

compile:
 [echo] Building spatial...
Trying to override old definition of task m2-deploy
Trying to override old definition of task invoke-javadoc

build-queries:

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:

clover:

common.compile-core:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test

init:

clover.setup:

clover.info:

clover:

common.compile-core:
[mkdir] Created dir: 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spatial/classes/java
[javac] Compiling 29 source files to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spatial/classes/java
[javac] Note: 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/contrib/spatial/src/java/org/apache/lucene/spatial/tier/CartesianPolyFilterBuilder.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-core:

compile:
 [echo] Building spellchecker...
Trying to override old definition of task m2-deploy
Trying to override old definition of task invoke-javadoc

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:

clover:

common.compile-core:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test

init:

clover.setup:

clover.info:

clover:

compile-core:
[mkdir] Created dir: 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spellchecker/classes/java
[javac] Compiling 12 source files to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spellchecker/classes/java
[javac] Note: 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile:
 [echo] Building xml-query-parser...
Trying to override old definition of task m2-deploy
Trying to override old definition of task invoke-javadoc

build-queries:

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:

clover:

common.compile-core:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test

init:

clover.setup:

clover.info:

clover:

common.compile-core:
[mkdir] Created dir: 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/xml-query-parser/classes/java
[javac] Compiling 36 source files to 
https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/xml-query-parser/classes/java
[javac] Note: