Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Andi Vajda


On Wed, 6 Apr 2011, Bill Janssen wrote:


Andi Vajda va...@apache.org wrote:



On Wed, 6 Apr 2011, Bill Janssen wrote:


Andi Vajda va...@apache.org wrote:


Unless I'm missing something here, you've got two options before you
break your users:
  1. fix your code before you ship it to them


Unfortunately, the code is out there for building, and the instructions,
also already out there, say, PyLucene 2.4 to 3.X.  I should be more
careful :-).


Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x
deprecated APIs are removed from 3.1+ (unless I'm confused about
Lucene's deprecation policy (*)), your statement is a bit optimistic.


My Python code looks for the differences and handles it.  Of course, it
can't do that for the future :-).

Is there some ABI version # that I should be checking, instead?


There are two versions available from the lucene module:

import lucene
[(v, lucene.__dict__[v]) for v in dir(lucene) if 'VERSION' in v]
   [('JCC_VERSION', '2.8'), ('VERSION', '3.1.0')]

There is also the lucene.Version object.

Andi..


Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Bill Janssen
Andi Vajda va...@apache.org wrote:

 There are two versions available from the lucene module:
 
 import lucene
 [(v, lucene.__dict__[v]) for v in dir(lucene) if 'VERSION' in v]
[('JCC_VERSION', '2.8'), ('VERSION', '3.1.0')]

I suppose I could make a list of all the (JCC_VERSION, VERSION) pairs
that I've personally verified that the code works with, and raise an error
if a user attempts to install UpLib using a PyLucene that isn't on that
list...  But that seems like a sub-optimal solution :-).

Bill


Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Andi Vajda


On Wed, 6 Apr 2011, Bill Janssen wrote:


Andi Vajda va...@apache.org wrote:


There are two versions available from the lucene module:

   import lucene
   [(v, lucene.__dict__[v]) for v in dir(lucene) if 'VERSION' in v]
   [('JCC_VERSION', '2.8'), ('VERSION', '3.1.0')]


I suppose I could make a list of all the (JCC_VERSION, VERSION) pairs
that I've personally verified that the code works with, and raise an error
if a user attempts to install UpLib using a PyLucene that isn't on that
list...  But that seems like a sub-optimal solution :-).


Seems like the best solution to me.
How can you be sure your code works otherwise ?

Andi..


Re: My GSOC proposal

2011-04-06 Thread Simon Willnauer
Hey Varun,
On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Hi Varun,

 Those two issues would make a great GSoC!  Comments below...
+1

 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
 varunthacker1...@gmail.com wrote:

 I would like to combine two tasks as part of my project
 namely-Directory createOutput and openInput should take an IOContext
 (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
 UnixDir (Lucene-2795).

 The first part of the project is aimed at significantly reducing time
 taken to search during indexing by adding an IOContext which would
 store buffer size and have options to bypass the OS’s buffer cache
 (This is what causes the slowdown in search ) and other hints. Once
 completed I would move on to Lucene-2795 and generalize the Directory
 implementation to make a UnixDirectory .

 So, the first part (LUCENE-2793) should cause no change at all to
 performance, functionality, etc., because it's merely installing the
 plumbing (IOContext threaded throughout the low-level store APIs in
 Lucene) so that higher levels can send important details down to the
 Directory.  We'd fix IndexWriter/IndexReader to fill out this
 IOContext with the details (merging, flushing, new reader, etc.).

 There's some fun/freedom here in figuring out just what details should
 be included in IOContext... (eg: is it low level set buffer size to 4 KB
 or is it high level I am opening a new near-real-time reader).

 This first step is a rote cutover, just changing APIs but in no way
 taking advantage of the new APIs.

 The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
 by creating a UnixDir impl that, using JNI (C code), passes advanced
 flags when opening files, based on the incoming IOContext.

 The goal is a single UnixDir that has ifdefs so that it's usable
 across multiple Unices, and eg would use direct IO if the context is
 merging.  If we are ambitious we could rope Windows into the mix, too,
 and then this would be NativeDir...

 We can measure success by validating that a big merge while searching
 does not hurt search performance?  (Ie we should be able to reproduce
 the results from
 http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).

Thanks for the summary mike!

 I have spoken to Micheal McCandless and Simon Willnauer about
 undertaking these tasks. Micheal McCandless has agreed to mentor me .
 I would love to be able to contribute and learn from Apache Lucene
 community this summer. Also I would love suggestions on how to make my
 application proposal stronger.

 I think either Simon or I can be the official mentor, and then the
 other one of us (and other Lucene committers) will support/chime
 in...

I will take the official responsibility here once we are there!
simon

 This is an important change for Lucene!

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2455) admin/index.jsp double submit on IE

2011-04-06 Thread Jeffrey Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Chang updated SOLR-2455:


Attachment: SOLR-2455.patch

Modified both index.jsp and form.jsp to return false upon JS submit.

 admin/index.jsp double submit on IE
 ---

 Key: SOLR-2455
 URL: https://issues.apache.org/jira/browse/SOLR-2455
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: IE8
Reporter: Jeffrey Chang
Priority: Minor
  Labels: patch
 Attachments: SOLR-2455.patch, SOLR-2455.patch


 /admin/index.jsp could issue a double submit on IE causing Jetty to error out.
 Here are the steps to reproduce on IE8 (only applies to IE8 on occasional 
 basis, really more of an IE8 bug...):
 1. Open IE8
 2. Browse to http://localhost:8983/solr/admin
 3. Submit a query
 4. Displayed on Jetty log due to double submit:
 SEVERE: org.mortbay.jetty.EofException
 at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
  
 This can be fixed easily by modifying index.jsp's javascript submit to return 
 false:
 ... queryForm.submit(); return false; ...
 I will try to submit a patch for this easy fix, new to all this so please 
 bear with me...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Inquiries on SOLR DEV Contribution (SOLR-2455)

2011-04-06 Thread Jeffrey Chang
Hi All,

I'd like to start small and see how I can contribute to SOLR development.

By following http://wiki.apache.org/solr/HowToContribute, I've created a new
defect (SOLR-2455) and created a patch for it.

Not sure if I've done the right steps - can someone provide me some guidance
if I'm on the right track to make some contributions?

I'm still confused on how the committers decide which patch to include the
fixes into. E.g. for the fixes I contribute, since I modified from Trunk,
I'd assume it goes to SOLR 4.0.x?

Also, should I modify JIRA csae status to Resolve myself?

Thanks,
Jeff


[jira] [Commented] (SOLR-2455) admin/index.jsp double submit on IE

2011-04-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016281#comment-13016281
 ] 

Uwe Schindler commented on SOLR-2455:
-

Hi Jeffrey,

thaks for the fix. This is really an issue and has nothing to do with Internet 
Explorer. The timing of the javascript calls in this browser just make it 
happen. In general: onclick handlers in javascript *must* return false to 
prevent the default action. This is true in all browsers. You can try this out 
with a simple web page link: a href=gohere onclick=window.alert('clicked'); 
return true;../a. This link will first display the message box and then go 
to gohere (in all browsers!), versus a href=gohere 
onclick=window.alert('clicked'); return false;../a will only display the 
message box.

Another fix for this would be to simply remove form.submit() and explicitely 
return true.

 admin/index.jsp double submit on IE
 ---

 Key: SOLR-2455
 URL: https://issues.apache.org/jira/browse/SOLR-2455
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: IE8
Reporter: Jeffrey Chang
Priority: Minor
  Labels: patch
 Attachments: SOLR-2455.patch, SOLR-2455.patch


 /admin/index.jsp could issue a double submit on IE causing Jetty to error out.
 Here are the steps to reproduce on IE8 (only applies to IE8 on occasional 
 basis, really more of an IE8 bug...):
 1. Open IE8
 2. Browse to http://localhost:8983/solr/admin
 3. Submit a query
 4. Displayed on Jetty log due to double submit:
 SEVERE: org.mortbay.jetty.EofException
 at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
  
 This can be fixed easily by modifying index.jsp's javascript submit to return 
 false:
 ... queryForm.submit(); return false; ...
 I will try to submit a patch for this easy fix, new to all this so please 
 bear with me...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2455) admin/index.jsp double submit on IE

2011-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-2455:
---

Assignee: Uwe Schindler

 admin/index.jsp double submit on IE
 ---

 Key: SOLR-2455
 URL: https://issues.apache.org/jira/browse/SOLR-2455
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: IE8
Reporter: Jeffrey Chang
Assignee: Uwe Schindler
Priority: Minor
  Labels: patch
 Attachments: SOLR-2455.patch, SOLR-2455.patch


 /admin/index.jsp could issue a double submit on IE causing Jetty to error out.
 Here are the steps to reproduce on IE8 (only applies to IE8 on occasional 
 basis, really more of an IE8 bug...):
 1. Open IE8
 2. Browse to http://localhost:8983/solr/admin
 3. Submit a query
 4. Displayed on Jetty log due to double submit:
 SEVERE: org.mortbay.jetty.EofException
 at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
  
 This can be fixed easily by modifying index.jsp's javascript submit to return 
 false:
 ... queryForm.submit(); return false; ...
 I will try to submit a patch for this easy fix, new to all this so please 
 bear with me...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2455) admin/index.jsp double submit on IE

2011-04-06 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2455:


Fix Version/s: 4.0
   3.2
   3.1.1

 admin/index.jsp double submit on IE
 ---

 Key: SOLR-2455
 URL: https://issues.apache.org/jira/browse/SOLR-2455
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: IE8
Reporter: Jeffrey Chang
Assignee: Uwe Schindler
Priority: Minor
  Labels: patch
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2455.patch, SOLR-2455.patch


 /admin/index.jsp could issue a double submit on IE causing Jetty to error out.
 Here are the steps to reproduce on IE8 (only applies to IE8 on occasional 
 basis, really more of an IE8 bug...):
 1. Open IE8
 2. Browse to http://localhost:8983/solr/admin
 3. Submit a query
 4. Displayed on Jetty log due to double submit:
 SEVERE: org.mortbay.jetty.EofException
 at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
  
 This can be fixed easily by modifying index.jsp's javascript submit to return 
 false:
 ... queryForm.submit(); return false; ...
 I will try to submit a patch for this easy fix, new to all this so please 
 bear with me...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Inquiries on SOLR DEV Contribution (SOLR-2455)

2011-04-06 Thread Uwe Schindler
Hi Jeffry,

 

You don't have to do anything on this issue. I already assigned it to myself
and I will commit your patch to 4.0 (trunk) and backport through simple
merges.

 

In general to bring fixes in, simply open issues, we will take care. If a
fix is broken or not valid, somebody will notify you!

 

Thanks for helping to improve Solr!

 

Thanks!

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Jeffrey Chang [mailto:jclal...@gmail.com] 
Sent: Wednesday, April 06, 2011 8:51 AM
To: dev@lucene.apache.org
Subject: Inquiries on SOLR DEV Contribution (SOLR-2455)

 

Hi All,

 

I'd like to start small and see how I can contribute to SOLR development.

 

By following http://wiki.apache.org/solr/HowToContribute, I've created a new
defect (SOLR-2455) and created a patch for it.

 

Not sure if I've done the right steps - can someone provide me some guidance
if I'm on the right track to make some contributions?

 

I'm still confused on how the committers decide which patch to include the
fixes into. E.g. for the fixes I contribute, since I modified from Trunk,
I'd assume it goes to SOLR 4.0.x?

 

Also, should I modify JIRA csae status to Resolve myself?

 

Thanks,

Jeff



[jira] [Commented] (SOLR-2455) admin/index.jsp double submit on IE

2011-04-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016291#comment-13016291
 ] 

Uwe Schindler commented on SOLR-2455:
-

Committed trunk revision 1089335, branch 3.x revision 1089340

I will keep this open for possible backport to 3.1.1

 admin/index.jsp double submit on IE
 ---

 Key: SOLR-2455
 URL: https://issues.apache.org/jira/browse/SOLR-2455
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: IE8
Reporter: Jeffrey Chang
Assignee: Uwe Schindler
Priority: Minor
  Labels: patch
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2455.patch, SOLR-2455.patch


 /admin/index.jsp could issue a double submit on IE causing Jetty to error out.
 Here are the steps to reproduce on IE8 (only applies to IE8 on occasional 
 basis, really more of an IE8 bug...):
 1. Open IE8
 2. Browse to http://localhost:8983/solr/admin
 3. Submit a query
 4. Displayed on Jetty log due to double submit:
 SEVERE: org.mortbay.jetty.EofException
 at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)
  
 This can be fixed easily by modifying index.jsp's javascript submit to return 
 false:
 ... queryForm.submit(); return false; ...
 I will try to submit a patch for this easy fix, new to all this so please 
 bear with me...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread JIRA
post.jar fails on non-XML updateHandlers


 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl


SimplePostTool.java by default tries to issue a commit after posting.
Problem is that it does this by appending commit/ to the stream.
This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016314#comment-13016314
 ] 

Jan Høydahl commented on SOLR-2458:
---

Example:

{code}
lap:exampledocs janhoy$ java -Durl=http://localhost:8983/solr/update/csv -jar 
post.jar books.csv
SimplePostTool: version 1.3
SimplePostTool: POSTing files to http://localhost:8983/solr/update/csv..
SimplePostTool: POSTing file books.csv
SimplePostTool: COMMITting Solr index changes..
SimplePostTool: FATAL: Solr returned an error #400 undefined field commit/
{code}

The commit should be sent in a different way, problem is how to know where and 
how to send the commit in case of non-standard URLs, such as 
http://localhost:8983/solr/my/custom/updatehandler

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016326#comment-13016326
 ] 

Uwe Schindler commented on SOLR-2458:
-

The commit could be snet at the end as a single xml document in a separate 
request if the content type of data is different.

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016329#comment-13016329
 ] 

Jan Høydahl commented on SOLR-2458:
---

How would post.jar know the URL of the XmlUpdateRequestHandler?
A) We could assume .*/solr/update as 99% would not modify the defaults?
Or
B) Assume that all UpdateRequestHandlers support a GET parameter commit=true
In that case, we could append ?commit=true to the given URL.
I know for a fact that /solr/update/csv?commit=true will work


 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3015) UDIDIndexWriter keeps write lock on corrupt index

2011-04-06 Thread Christian Danninger (JIRA)
UDIDIndexWriter keeps write lock on corrupt index
-

 Key: LUCENE-3015
 URL: https://issues.apache.org/jira/browse/LUCENE-3015
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.3
 Environment: Lucene 2.9.3
Reporter: Christian Danninger


Try to open an index writer with 
new UDIDIndexWriter(directory, new FakeAnalyzer(), false);
keeps a write.lock.
Creating the IndexWriter will succeed, but a subsequent call to 
UDIDIndexWriter.getCounter() in the constructor failes.
There are no possibilites to remove write.lock per an API call.

The index writer is used to optimize the index, the index itself will be 
created by an different index. So after some time the index will be valid 
again, but the write lock still exists. So the process has to ended first an 
afterward the write lock could be removed.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-04-06 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016372#comment-13016372
 ] 

Dawid Weiss commented on SOLR-2378:
---

I've been waiting for somebody to look at this patch, guys, just to confirm 
that everything is fine with it. If so, I'd like to commit it in and move on to 
infix suggestions support maybe.

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3015) UDIDIndexWriter keeps write lock on corrupt index

2011-04-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016373#comment-13016373
 ] 

Uwe Schindler commented on LUCENE-3015:
---

What are you talking about? Lucene has no class UDIDIndexWriter, so maybe 
thats an external customization.

If this is the case, I will close the issue.

 UDIDIndexWriter keeps write lock on corrupt index
 -

 Key: LUCENE-3015
 URL: https://issues.apache.org/jira/browse/LUCENE-3015
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.3
 Environment: Lucene 2.9.3
Reporter: Christian Danninger

 Try to open an index writer with 
 new UDIDIndexWriter(directory, new FakeAnalyzer(), false);
 keeps a write.lock.
 Creating the IndexWriter will succeed, but a subsequent call to 
 UDIDIndexWriter.getCounter() in the constructor failes.
 There are no possibilites to remove write.lock per an API call.
 The index writer is used to optimize the index, the index itself will be 
 created by an different index. So after some time the index will be valid 
 again, but the write lock still exists. So the process has to ended first an 
 afterward the write lock could be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3015) UDIDIndexWriter keeps write lock on corrupt index

2011-04-06 Thread Christian Danninger (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016375#comment-13016375
 ] 

Christian Danninger commented on LUCENE-3015:
-

Sorry about that, you are right.

I'll close the ticket.



 UDIDIndexWriter keeps write lock on corrupt index
 -

 Key: LUCENE-3015
 URL: https://issues.apache.org/jira/browse/LUCENE-3015
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.3
 Environment: Lucene 2.9.3
Reporter: Christian Danninger

 Try to open an index writer with 
 new UDIDIndexWriter(directory, new FakeAnalyzer(), false);
 keeps a write.lock.
 Creating the IndexWriter will succeed, but a subsequent call to 
 UDIDIndexWriter.getCounter() in the constructor failes.
 There are no possibilites to remove write.lock per an API call.
 The index writer is used to optimize the index, the index itself will be 
 created by an different index. So after some time the index will be valid 
 again, but the write lock still exists. So the process has to ended first an 
 afterward the write lock could be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-04-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016378#comment-13016378
 ] 

Robert Muir commented on SOLR-2378:
---

Took a quick look:

Builder.add(char[], int, int, ..) adds codepoints 
(Character.codePointAt/Character.charCount) [utf-32 order] but the comparator 
you use when building the automaton compares characters [utf-16 order]. so if 
someone has a term in the supplementary range in their index, the order will be 
inconsistent.

So I think the comparator should just compare codepoints (it should iterate 
with codePointAt/charCount too)?


 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-04-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016381#comment-13016381
 ] 

Yonik Seeley commented on SOLR-2378:


If it causes too much of a lookup performance hit, the Builder could just build 
in utf-16 order too?

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Wiki docs about compression

2011-04-06 Thread Smiley, David W.
Yes, you're right Eric.  The feature was removed in Solr 1.4.1 much to my 
chagrin. On my todo list in the next couple months or so, I intend to bring it 
back -- at least for 4.0

~ David

From: Eric Pugh [ep...@opensourceconnections.com]
Sent: Wednesday, April 06, 2011 12:27 AM
To: solr-...@lucene.apache.org
Subject: Wiki docs about compression

Correct me if I am wrong, but isn't compression of fields removed from Solr 
3.1?  I think the docs about compression on the wiki at 
http://wiki.apache.org/solr/SchemaXml need to clarify that in 3.1 these 
features where removed!

Just wanted to confirm my understanding of this

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.










-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit

2011-04-06 Thread Jayson Minard (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016382#comment-13016382
 ] 

Jayson Minard commented on SOLR-1155:
-

Is there interest in me updating this for 3.1?  It is a huge performance 
improvement over DirectUpdateHandler2 under heavy indexing load...

 Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
 -

 Key: SOLR-1155
 URL: https://issues.apache.org/jira/browse/SOLR-1155
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3, 1.4
Reporter: Jayson Minard
 Fix For: Next

 Attachments: SOLR-1155-release1.4-rev834789.patch, 
 SOLR-1155-trunk-rev834706.patch, Solr-1155.patch, Solr-1155.patch


 Currently DirectUpdateHandler2 will block adds during a commit, and it seems 
 to be possible with recent changes to Lucene to allow them to run 
 concurrently.  
 See: 
 http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-04-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016384#comment-13016384
 ] 

Robert Muir commented on SOLR-2378:
---

I am referring to build-time, not runtime here.

run-time can handle supplementary characters wrong and I wouldn't object to 
committing it,
but currently if someone has terms  0x in their index it will preventing 
the FST from
being built at all and suggesting will not work? (as i think the FST will throw 
exc?) 


 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Yonik Seeley
On Wed, Apr 6, 2011 at 9:30 AM, Grant Ingersoll
grant.ingers...@gmail.com wrote:
 By all means go for it.  I don't see any reason not too.  I guess in the end, 
 I'm not sure what you are asking us to do.  Do you want Lucene/Solr to remove 
 all of our spatial support in favor of incorporating this new project or do 
 you just want those who are interested in spatial to join the new project and 
 it can be seen as an add on?


Let's not confuse the issue... what is being discussed really has no
impact on the basic spatial search that was added to Solr.  As you
said yourself, the Solr geo stuff uses very little of the spatial
contrib stuff.

This is about building and maintaining a spatial module, and the best
place to do it (which I'll leave up to those doing the work... I'm
pretty happy with basic point, radius, bounding-box).

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Ryan McKinley

 The spatial API in google code takes a pretty different approach to
 spatial search in general.  It is organized into three key packages:
 1. core stuff, no lucene dependencies.  Most of the math is here

 Aren't you just replicating what SIS is doing for this piece?  If you don't 
 have a JTS requirement, that means you are going to need equivalent math, 
 right?  Isn't that what SIS is about?


This package defines the general interfaces and concepts used in the
project.  Things like SpatialOperations, Shape, PrefixGrid and
DistanceCalculator -- these can then be backed by simple math, JTS, or
eventually maybe SIS.

The other key stuff in this package is the client side objects that to
build spatial queries.  Essentially everything that could be bundled
with solrj.



 I could suggest a new ASF project, but there seems like too much
 overlap with SIS and very different philosophy on 3rd party libraries.
 In the end, osgeo.com seems like a more natural home and has better
 branding for spatial related work anyway.


 By all means go for it.  I don't see any reason not too.  I guess in the end, 
 I'm not sure what you are asking us to do.  Do you want Lucene/Solr to remove 
 all of our spatial support in favor of incorporating this new project or do 
 you just want those who are interested in spatial to join the new project and 
 it can be seen as an add on?


I'm trying to have an open discussion about what makes sense for
spatial development.  I don't *want* to start a new project... but I
think we need a dev/test environment that can support the whole range
of spatial needs -- without reinventing many wheels, this includes
JTS.

Lucene currently has LGPL compile dependencies, but they are on the
way out, and (unless I'm missing something) i don't think folks are
open to adding a JTS build/test dependency --  Maybe I should call a
vote on the JTS issue, though i suspect most binding votes are -0 or
-1.  I *totally* understand if other people don't want JTS in the
build system -- it is not a core concern to most people involved.

If the lucene build/test environment does not support spatial
development, this leads me to think about other places to host the
project...  wherever it makes the most sense.  I would prefer staying
within lucene because it is easiest for me.

I don't want this to be competition or duplicate effort.  I hope it
lets us clean up the broken stuff from lucene and overtime deprecate
the parts that are better supported elsewhere.

I want the best spatial support available in solr out-of-the-box.  If
this project is eventually built and maintained outside of lucene, i
would like the .jar distributed in solr.


ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016395#comment-13016395
 ] 

Yonik Seeley commented on SOLR-2458:


bq. Assume that all UpdateRequestHandlers support a GET parameter commit=true

I think we should assume this, and fix anything where it doesn't work.

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6796 - Failure

2011-04-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6796/

2 tests failed.
REGRESSION:  org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch

Error Message:
Error executing query

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119)
at 
org.apache.solr.cloud.BasicDistributedZkTest.queryServer(BasicDistributedZkTest.java:274)
at 
org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:335)
at 
org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:128)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:593)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
Caused by: org.apache.solr.common.SolrException: no servers hosting shard:

no servers hosting shard:

request: http://127.0.0.1:33773/solr/select?q=*:*sort=n_ti1 
descshards=shard3,shard4,shard5,shard6distrib=truewt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:249)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:152)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)


REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:227)




Build Log (for compile errors):
[...truncated 8761 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Grant Ingersoll

On Apr 6, 2011, at 10:45 AM, Ryan McKinley wrote:
 I'm trying to have an open discussion about what makes sense for
 spatial development.  I don't *want* to start a new project... but I
 think we need a dev/test environment that can support the whole range
 of spatial needs -- without reinventing many wheels, this includes
 JTS.
 
 Lucene currently has LGPL compile dependencies, but they are on the
 way out, and (unless I'm missing something) i don't think folks are
 open to adding a JTS build/test dependency --  Maybe I should call a
 vote on the JTS issue, though i suspect most binding votes are -0 or
 -1.  I *totally* understand if other people don't want JTS in the
 build system -- it is not a core concern to most people involved.

Until there is a specific patch that brings in and shows how JTS would be 
incorporated (via reflection and as a totally optional piece, presumably, per 
the ASF LGPL guidelines), there really isn't anything to vote on. 


 I don't want this to be competition or duplicate effort.  I hope it
 lets us clean up the broken stuff from lucene and overtime deprecate
 the parts that are better supported elsewhere.

I totally agree.  I hope I wasn't framing it that way.  I'm just trying to 
understand what's being proposed.  I can see advantages to both.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: My GSOC proposal

2011-04-06 Thread Varun Thacker
 Hi. I wrote a sample code to test out speed difference between SEQUENTIAL
and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .

This is the link to the code: http://pastebin.com/8QywKGyS

There was a speed difference which when i switched between the two flags. I
have not used the O_DIRECT flag because Linus had criticized it.

Is this what the flags are intended to be used for ? This is just a sample
code with a test file .

On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer 
simon.willna...@googlemail.com wrote:
 Hey Varun,
 On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Hi Varun,

 Those two issues would make a great GSoC!  Comments below...
 +1

 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
 varunthacker1...@gmail.com wrote:

 I would like to combine two tasks as part of my project
 namely-Directory createOutput and openInput should take an IOContext
 (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
 UnixDir (Lucene-2795).

 The first part of the project is aimed at significantly reducing time
 taken to search during indexing by adding an IOContext which would
 store buffer size and have options to bypass the OS’s buffer cache
 (This is what causes the slowdown in search ) and other hints. Once
 completed I would move on to Lucene-2795 and generalize the Directory
 implementation to make a UnixDirectory .

 So, the first part (LUCENE-2793) should cause no change at all to
 performance, functionality, etc., because it's merely installing the
 plumbing (IOContext threaded throughout the low-level store APIs in
 Lucene) so that higher levels can send important details down to the
 Directory.  We'd fix IndexWriter/IndexReader to fill out this
 IOContext with the details (merging, flushing, new reader, etc.).

 There's some fun/freedom here in figuring out just what details should
 be included in IOContext... (eg: is it low level set buffer size to 4
KB
 or is it high level I am opening a new near-real-time reader).

 This first step is a rote cutover, just changing APIs but in no way
 taking advantage of the new APIs.

 The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
 by creating a UnixDir impl that, using JNI (C code), passes advanced
 flags when opening files, based on the incoming IOContext.

 The goal is a single UnixDir that has ifdefs so that it's usable
 across multiple Unices, and eg would use direct IO if the context is
 merging.  If we are ambitious we could rope Windows into the mix, too,
 and then this would be NativeDir...

 We can measure success by validating that a big merge while searching
 does not hurt search performance?  (Ie we should be able to reproduce
 the results from
 http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).

 Thanks for the summary mike!

 I have spoken to Micheal McCandless and Simon Willnauer about
 undertaking these tasks. Micheal McCandless has agreed to mentor me .
 I would love to be able to contribute and learn from Apache Lucene
 community this summer. Also I would love suggestions on how to make my
 application proposal stronger.

 I think either Simon or I can be the official mentor, and then the
 other one of us (and other Lucene committers) will support/chime
 in...

 I will take the official responsibility here once we are there!
 simon

 This is an important change for Lucene!

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org






-- 


Regards,
Varun Thacker
http://varunthacker.wordpress.com


Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Bill Janssen
I'm seeing parse failures on this query string:

categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT 
categories:RSSReader/_noexpire_

thr002: RSSReader:  Traceback (most recent call last):
thr002:   File /local/lib/UpLib-1.7.11/site-extensions/RSSReader.py, line 
271, in _scan_rss_sites
thr002: hits = repo.do_query(categories:RSSReader AND 
id:[0-00--000 TO %s] AND NOT categories:RSSReader/_noexpire_ % old_id)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1172, 
in do_query
thr002: results = self.do_full_query(query_string, searchtype)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1196, 
in do_full_query
thr002: results = self.pylucene_search(searchtype, query_string)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1081, 
in pylucene_search
thr002: v = self.__search_context.search(query_string)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 913, in 
search
thr002: parsed_query = query_parser.parseQ(query)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 550, in 
parseQ
thr002: query = QueryParser.parse(self, querystring)
thr002: JavaError: org.apache.jcc.PythonException: getFieldQuery_quoted
thr002: AttributeError: getFieldQuery_quoted

thr002: Java stacktrace:
thr002: org.apache.jcc.PythonException: getFieldQuery_quoted
thr002: AttributeError: getFieldQuery_quoted

thr002: at 
org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery_quoted(Native
 Method)
thr002: at 
org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery(Unknown
 Source)
thr002: at 
org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421)
thr002: at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309)
thr002: at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237)
thr002: at 
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
thr002: at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)


Bill


Re: Lucene Spatial Future

2011-04-06 Thread Smiley, David W.
On Apr 6, 2011, at 11:38 AM, Grant Ingersoll wrote:

 Until there is a specific patch that brings in and shows how JTS would be 
 incorporated (via reflection and as a totally optional piece, presumably, per 
 the ASF LGPL guidelines), there really isn't anything to vote on.

I think what is being asked to vote on is deprecation/removal of Lucene's 
spatial contrib module with its replacement being an externally hosted 
ASL-licened module expressly designed to work with Lucene/Solr 4.0 and beyond 
(temporarily known as lucene-spatial-playground).  What would stay is the 
_basic_ spatial support that got into Lucene/Solr 3.1. Furthermore, no future 
spatial work would be accepted on Lucene/Solr aside from support of the basic 
capability.

This module isn't quite ready so perhaps the vote should wait till it is.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Inquiries on SOLR DEV Contribution (SOLR-2455)

2011-04-06 Thread Jeffrey Chang
Thanks Uwe.

I'll dig in more to see how I can help further now that I understand the
contribution process more.

Thanks,
Jeff

On Wed, Apr 6, 2011 at 3:19 PM, Uwe Schindler u...@thetaphi.de wrote:

  Hi Jeffry,



 You don’t have to do anything on this issue. I already assigned it to
 myself and I will commit your patch to 4.0 (trunk) and backport through
 simple merges.



 In general to bring fixes in, simply open issues, we will take care. If a
 fix is broken or not valid, somebody will notify you!



 Thanks for helping to improve Solr!



 Thanks!

 Uwe



 -

 Uwe Schindler

 H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de

 eMail: u...@thetaphi.de



 *From:* Jeffrey Chang [mailto:jclal...@gmail.com]
 *Sent:* Wednesday, April 06, 2011 8:51 AM
 *To:* dev@lucene.apache.org
 *Subject:* Inquiries on SOLR DEV Contribution (SOLR-2455)



 Hi All,



 I'd like to start small and see how I can contribute to SOLR development.



 By following http://wiki.apache.org/solr/HowToContribute, I've created a
 new defect (SOLR-2455) and created a patch for it.



 Not sure if I've done the right steps - can someone provide me some
 guidance if I'm on the right track to make some contributions?



 I'm still confused on how the committers decide which patch to include the
 fixes into. E.g. for the fixes I contribute, since I modified from Trunk,
 I'd assume it goes to SOLR 4.0.x?



 Also, should I modify JIRA csae status to Resolve myself?



 Thanks,

 Jeff



Re: Lucene Spatial Future

2011-04-06 Thread Ryan McKinley
 -1.  I *totally* understand if other people don't want JTS in the
 build system -- it is not a core concern to most people involved.

 Until there is a specific patch that brings in and shows how JTS would be 
 incorporated (via reflection and as a totally optional piece, presumably, per 
 the ASF LGPL guidelines), there really isn't anything to vote on.


fair point -- the optional logistics are working in a maven build.
I'm reluctant to convert to the ant build system if there is already
strong opposition to the idea.  If folks are OK with the idea, I will
happily make concrete patch/branch that we could vote on.

so maybe i'm just looking for a POLL not a vote -- find out if this is
a non-starter or not (i am under the impression that it might be)

FYI, the optional support is now handled by a static
SpatialContextProvider' that you can ask for a SpatialContext.  By
default it makes a SimpleSpatialContext -- if you set some system
properties, it uses reflection to load a different instance.
Eventually, this should be replaced with the standard java service
loader stuff (i think)

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Bill Janssen
Bill Janssen jans...@parc.com wrote:

 I'm seeing parse failures on this query string:
 
 categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT 
 categories:RSSReader/_noexpire_

3.0.3 works just fine.

Bill


[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-04-06 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016464#comment-13016464
 ] 

David Smiley commented on SOLR-2438:


Nice Peter. So why did you create another JIRA issue instead of putting your 
patch on SOLR-219?  This is yet another issue and there is already a 
quasi-community of commenters (including me) on that other issue).

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Grant Ingersoll

On Apr 6, 2011, at 12:06 PM, Smiley, David W. wrote:

 On Apr 6, 2011, at 11:38 AM, Grant Ingersoll wrote:
 
 Until there is a specific patch that brings in and shows how JTS would be 
 incorporated (via reflection and as a totally optional piece, presumably, 
 per the ASF LGPL guidelines), there really isn't anything to vote on.
 
 I think what is being asked to vote on is deprecation/removal of Lucene's 
 spatial contrib module


Just FYI, It is already deprecated in 3.x and slated for removal in 4.0.  
Someone just needs to axe the appropriate bits (and either move what's needed 
to Solr or to modules)

 with its replacement being an externally hosted ASL-licened module expressly 
 designed to work with Lucene/Solr 4.0 and beyond (temporarily known as 
 lucene-spatial-playground).  What would stay is the _basic_ spatial support 
 that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be 
 accepted on Lucene/Solr aside from support of the basic capability.

That is the piece I was wondering about and why I said yesterday it isn't 
likely to work, as it will just fork.  How do you tell people not to put in 
patches to L/S, especially when part of it is native and part of it isn't?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: My GSOC proposal

2011-04-06 Thread Michael McCandless
That test code looks good -- you really should have seen awful
performance had you used O_DIRECT since you read byte by byte.

A more realistic test is to read a whole buffer (eg 4 KB is what
Lucene now uses during merging, but we'd probably up this to like 1 MB
when using O_DIRECT).

Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
for good reason: its existence means projects like ours can use it to
work around limitations in the Linux IO apis that control the buffer
cache when, otherwise, we might conceivably make patches to fix Linux
correctly.  It's an escape hatch, and we all use the escape hatch
instead of trying to fix Linux for real...

For example the NOREUSE flag is a no-op now in Linux, which is a
shame, because that's precisely the flag we'd want to use for merging
(along with SEQUENTIAL).  Had that flag been implemented well, it'd
give better results than our workaround using O_DIRECT.

Anyway, giving how things are, until we can get more control (wy
up in Javaland) over the buffer cache, O_DIRECT (via native directory
impl through JNI) is our only real option, today.

More details here:
http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html

Note that other OSs likely do a better job and actually implement
NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
would simply use NOREUSE on these platforms for I/O during segment
merging.

Mike

http://blog.mikemccandless.com

On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
varunthacker1...@gmail.com wrote:
 Hi. I wrote a sample code to test out speed difference between SEQUENTIAL
 and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .

 This is the link to the code: http://pastebin.com/8QywKGyS

 There was a speed difference which when i switched between the two flags. I
 have not used the O_DIRECT flag because Linus had criticized it.

 Is this what the flags are intended to be used for ? This is just a sample
 code with a test file .

 On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 Hey Varun,
 On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Hi Varun,

 Those two issues would make a great GSoC!  Comments below...
 +1

 On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
 varunthacker1...@gmail.com wrote:

 I would like to combine two tasks as part of my project
 namely-Directory createOutput and openInput should take an IOContext
 (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
 UnixDir (Lucene-2795).

 The first part of the project is aimed at significantly reducing time
 taken to search during indexing by adding an IOContext which would
 store buffer size and have options to bypass the OS’s buffer cache
 (This is what causes the slowdown in search ) and other hints. Once
 completed I would move on to Lucene-2795 and generalize the Directory
 implementation to make a UnixDirectory .

 So, the first part (LUCENE-2793) should cause no change at all to
 performance, functionality, etc., because it's merely installing the
 plumbing (IOContext threaded throughout the low-level store APIs in
 Lucene) so that higher levels can send important details down to the
 Directory.  We'd fix IndexWriter/IndexReader to fill out this
 IOContext with the details (merging, flushing, new reader, etc.).

 There's some fun/freedom here in figuring out just what details should
 be included in IOContext... (eg: is it low level set buffer size to 4
 KB
 or is it high level I am opening a new near-real-time reader).

 This first step is a rote cutover, just changing APIs but in no way
 taking advantage of the new APIs.

 The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
 by creating a UnixDir impl that, using JNI (C code), passes advanced
 flags when opening files, based on the incoming IOContext.

 The goal is a single UnixDir that has ifdefs so that it's usable
 across multiple Unices, and eg would use direct IO if the context is
 merging.  If we are ambitious we could rope Windows into the mix, too,
 and then this would be NativeDir...

 We can measure success by validating that a big merge while searching
 does not hurt search performance?  (Ie we should be able to reproduce
 the results from
 http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).

 Thanks for the summary mike!

 I have spoken to Micheal McCandless and Simon Willnauer about
 undertaking these tasks. Micheal McCandless has agreed to mentor me .
 I would love to be able to contribute and learn from Apache Lucene
 community this summer. Also I would love suggestions on how to make my
 application proposal stronger.

 I think either Simon or I can be the official mentor, and then the
 other one of us (and other Lucene committers) will support/chime
 in...

 I will take the official responsibility here once we are there!
 simon

 This is an important change for Lucene!

 Mike

 

[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016471#comment-13016471
 ] 

Yonik Seeley commented on SOLR-2458:


part of the reason it works the way it does now is that when commit=true it 
POSTs a single commit at the end of multiple file POSTs, if we use the param 
based commit it would either need to specify commit on all of them, or keep 
track of the last one only add the param there.

Although only adding a commit on the last update should be easy, we could 
also just do it via the URL.  I believe posting ?commit=true to update 
handlers w/o a body works?


 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016471#comment-13016471
 ] 

Yonik Seeley edited comment on SOLR-2458 at 4/6/11 6:13 PM:


bq. part of the reason it works the way it does now is that when commit=true it 
POSTs a single commit at the end of multiple file POSTs, if we use the param 
based commit it would either need to specify commit on all of them, or keep 
track of the last one only add the param there.

Although only adding a commit on the last update should be easy, we could 
also just do it via the URL.  I believe posting ?commit=true to update 
handlers w/o a body works?


  was (Author: ysee...@gmail.com):
part of the reason it works the way it does now is that when commit=true it 
POSTs a single commit at the end of multiple file POSTs, if we use the param 
based commit it would either need to specify commit on all of them, or keep 
track of the last one only add the param there.

Although only adding a commit on the last update should be easy, we could 
also just do it via the URL.  I believe posting ?commit=true to update 
handlers w/o a body works?

  
 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016468#comment-13016468
 ] 

Hoss Man commented on SOLR-2458:



post.jar has hardcoded assumptions about what URL you want to hit and how it 
should behave -- if you want to change those assumptions there are documented 
params for changing it.  -Durl=... and -Dcommit=false.

if you want to post to something that isn't the XmlRequestHandler, you should 
specify -Dcommit=false, and then you can follow that with an explicit execution 
to commit...

java -Durl=... -jar post.jar *.cssv
java -jar post.jar

part of the reason it works the way it does now is that when commit=true it 
POSTs a single commit at the end of multiple file POSTs, if we use the param 
based commit it would either need to specify commit on all of them, or keep 
track of the last one only add the param there.

i don't object to changing post.jar to use a commit request param instead of 
sending the XML form, but this isn't a bug -- it's working as it was intended.


 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Smiley, David W.

On Apr 6, 2011, at 2:08 PM, Grant Ingersoll wrote:
 with its replacement being an externally hosted ASL-licened module expressly 
 designed to work with Lucene/Solr 4.0 and beyond (temporarily known as 
 lucene-spatial-playground).  What would stay is the _basic_ spatial support 
 that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be 
 accepted on Lucene/Solr aside from support of the basic capability.
 
 That is the piece I was wondering about and why I said yesterday it isn't 
 likely to work, as it will just fork.  How do you tell people not to put in 
 patches to L/S, especially when part of it is native and part of it isn't?

I think risk of this is mitigated if the proposed external module is highly 
visible in L/S -- in other words, it's downloaded and packaged up as part of 
the distribution -- a jar, sitting along side the other contrib module jars (no 
JTS of course!).  Users would be referred to this module for non-basic spatial 
via the wiki and community in general. Of course I would prominently mention 
this module in the 2nd edition of my book ;-) which is well underway.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Smiley, David W.
On Apr 6, 2011, at 2:12 PM, Ryan McKinley wrote:

 [ ] OK with JTS compile dependency.  Spatial support should be a module
 [X] OK with JTS, but think this spatial stuff should happen elsewhere
 [ ] Please, no LGPL dependencies in lucene build

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Yonik Seeley
On Wed, Apr 6, 2011 at 2:08 PM, Grant Ingersoll
grant.ingers...@gmail.com wrote:
 On Apr 6, 2011, at 12:06 PM, Smiley, David W. wrote:
 with its replacement being an externally hosted ASL-licened module expressly 
 designed to work with Lucene/Solr 4.0 and beyond (temporarily known as 
 lucene-spatial-playground).  What would stay is the _basic_ spatial support 
 that got into Lucene/Solr 3.1. Furthermore, no future spatial work would be 
 accepted on Lucene/Solr aside from support of the basic capability.

 That is the piece I was wondering about and why I said yesterday it isn't 
 likely to work, as it will just fork.  How do you tell people not to put in 
 patches to L/S, especially when part of it is native and part of it isn't?

Right - there's no need to try and make promises about the future.  It
seems unrelated to the questions at hand here.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.

2011-04-06 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016481#comment-13016481
 ] 

Dawid Weiss commented on SOLR-2378:
---

Oh, right -- I didn't peek at the inside of Builder.add(char[],...), but I will 
verify this by trying to add something that has multilingual stuff in it -- 
will update the patch tomorrow, hopefully. I would also love to have somebody 
who actually uses suggestions to try to compile it and use it on a production 
data set to see if my benchmark was approximately right with respect to the 
speed differences between the different available implementations.

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Robert Muir
On Wed, Apr 6, 2011 at 2:12 PM, Ryan McKinley ryan...@gmail.com wrote:

 Some may be following the thread on spatial development...  here is a
 quick summary, and a poll to help decide what may be the best next
 move.

 I'm hoping to introduce a high level spatial API that can be used for
 a variety of indexing strategies and computational needs.  For simple
 point in BBox and point in WGS84 radius, this does not require any
 external libraries.  To support more complex queries -- point in
 polygon, complex geometry intersections, etc -- we need an LGPL
 library JTS.  The LGPL dependency is only needed to compile/test,
 there is no runtime requirement for JTS.  To enable the more
 complicated options you would need to add JTS to the classpath and
 perhaps set a environment variable.  This is essentially what we are
 now doing with the (soon to be removed) bdb contrib.

 I am trying to figure out the best home for this code and development
 to live.  I think it is essential for the JTS support to be part of
 the core build/test -- splitting it into a separate module that is
 tested elsewhere is not an option.  This raises the basic question of
 if people are willing to have the LGPL build dependency as part of the
 main lucene build.  I think it is, but am sympathetic to the idea that
 it might not be.


I'm sorta confused about this (i'll probably offend someone here, but so be
it)

We have a contrib module for spatial that is experimental, people want to
deprecate, and say has problems.
Why must the super-expert-polygon stuff sit with the basic capability that
probably most users want: the ability to do basic searches (probably in
combination with text too) in their app?

Its hard for me to tell, i hope the reason isn't elegance, but why aren't
we working on making a simple,supported,80-20 case in lucene that
non-spatial-gurus (and users) understand and can maintain... then it would
seem ideal for the complex stuff to be outside of this project with any
dependencies it wants?

Users are probably really confused about the spatial situation: is it
because we are floundering around this expert stuff


Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Ryan McKinley
On Wed, Apr 6, 2011 at 2:39 PM, Grant Ingersoll
grant.ingers...@gmail.com wrote:

 I don't see why we need a compile/test dependency is needed at all:
 We provide a factory based spatial module where one specifies a
 SpatialProvider.  We have our own implementation of that which works for
 some set (or all) of the features.   An external project (Apache Extras?)

This is the non-starter for me.  This would split the dev across
multiple places and mean that the implementations I use (JTS) would
not be a first class citizen in testing.

This is the point of the whole debate... and why i think elsewhere may
be a better option.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Grant Ingersoll

On Apr 6, 2011, at 2:44 PM, Ryan McKinley wrote:

 On Wed, Apr 6, 2011 at 2:39 PM, Grant Ingersoll
 grant.ingers...@gmail.com wrote:
 
 I don't see why we need a compile/test dependency is needed at all:
 We provide a factory based spatial module where one specifies a
 SpatialProvider.  We have our own implementation of that which works for
 some set (or all) of the features.   An external project (Apache Extras?)
 
 This is the non-starter for me.  This would split the dev across
 multiple places and mean that the implementations I use (JTS) would
 not be a first class citizen in testing.
 
 This is the point of the whole debate... and why i think elsewhere may
 be a better option.


That's a bit contradictory, though, isn't it?  By definition, elsewhere means 
split too, b/c we have stated the point search stuff isn't going anywhere.  And 
even if it does, you will still need to have a separate factory based 
implementation and ship a non-JTS provider, otherwise none of it can be 
packaged into a L/S release, so it's still the same amount of work. 
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Robert Muir
On Wed, Apr 6, 2011 at 2:54 PM, Ryan McKinley ryan...@gmail.com wrote:

 The code can be separated so that the the dependencies are as you
 suggest -- i have done this, but it makes testing more difficult and
 less robust.  As part of the framework I've introduced a robust way to
 use the same data and and tests with different strategies and
 implementations.  For me to work on it, i need the stuff i use to be a
 first class citizen in testing.


Right, but this creates a problem for our testing too: if we open this
can of worms with optional LGPL stuff I think its going to actually
complicate build and testing.
I already stated my concerns about this here: http://s.apache.org/vE

I don't think the bdb should be used as justification already that the
can of worms is already open. Personally I didn't realize the
license it had, and for these same reasons, when i found this out i
put up a patch on Grant's issue.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6805 - Failure

2011-04-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6805/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin

Error Message:
expected:1 but was:0

Stack Trace:
junit.framework.AssertionFailedError: expected:1 but was:0
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:380)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)




Build Log (for compile errors):
[...truncated 8713 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Ryan McKinley
On Wed, Apr 6, 2011 at 2:48 PM, Grant Ingersoll
grant.ingers...@gmail.com wrote:

 On Apr 6, 2011, at 2:44 PM, Ryan McKinley wrote:

 On Wed, Apr 6, 2011 at 2:39 PM, Grant Ingersoll
 grant.ingers...@gmail.com wrote:

 I don't see why we need a compile/test dependency is needed at all:
 We provide a factory based spatial module where one specifies a
 SpatialProvider.  We have our own implementation of that which works for
 some set (or all) of the features.   An external project (Apache Extras?)

 This is the non-starter for me.  This would split the dev across
 multiple places and mean that the implementations I use (JTS) would
 not be a first class citizen in testing.

 This is the point of the whole debate... and why i think elsewhere may
 be a better option.


 That's a bit contradictory, though, isn't it?  By definition, elsewhere means 
 split too,

I'm looking at the proposed spatial strategy stuff as a unit.  It is
obviously related to existing stuff, but is a very different thing.


 b/c we have stated the point search stuff isn't going anywhere.

Agree -- i think the two would live happily together.  Parts of
existing point stuff may be deprecated if that seems appropriate. But
other parts -- especially the general vector based function queries
would never map to a high level spatial API anyway.


And even if it does, you will still need to have a separate factory based 
implementation and ship a non-JTS provider, otherwise none of it can be 
packaged into a L/S release, so it's still the same amount of work.

IIUC, we can distribute classes that were compiled against the JTS
API, but not JTS itself.  People could register what provider should
get used and if JTS is available, it would load that one via
reflection.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Ryan McKinley
On Wed, Apr 6, 2011 at 3:01 PM, Robert Muir rcm...@gmail.com wrote:
 On Wed, Apr 6, 2011 at 2:54 PM, Ryan McKinley ryan...@gmail.com wrote:

 The code can be separated so that the the dependencies are as you
 suggest -- i have done this, but it makes testing more difficult and
 less robust.  As part of the framework I've introduced a robust way to
 use the same data and and tests with different strategies and
 implementations.  For me to work on it, i need the stuff i use to be a
 first class citizen in testing.


 Right, but this creates a problem for our testing too: if we open this
 can of worms with optional LGPL stuff I think its going to actually
 complicate build and testing.
 I already stated my concerns about this here: http://s.apache.org/vE

 I don't think the bdb should be used as justification already that the
 can of worms is already open. Personally I didn't realize the
 license it had, and for these same reasons, when i found this out i
 put up a patch on Grant's issue.


I totally agree -- this was my preface to the whole discussion, and
why i think it may be more appropriate to move spatial dev to an
environment that can have different compile time choices.

I'd like to figure a way that this is a win for everyone -- this is
why i'm bothering with the prolonged discussion so that at least
motivations are clear and all that.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Spatial Future

2011-04-06 Thread Ryan McKinley

 Right - there's no need to try and make promises about the future.  It
 seems unrelated to the questions at hand here.


To be clear... I don't see any of this as promises -- obviously
nothing happens until there is somethign concrete to evaluate.

The point of this thread (for me anyway) is to raise my concerns, see
what people are thinking, and be transparent about my choices.

This discussion has made me feel like the right choice (for me) is to
pursue spatial development somewhere else -- likely osgeo -- and down
the road figure out how that could/should fit with solr.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Andi Vajda


 Hi Bill,

The QueryParser class changed a bit. More overloads were introduced on the 
Lucene side. You probably have a Python 'subclass' of QueryParser that needs 
a bit of work to adapt to the changes.


Look at the new version in 
apache/pylucene-3.1/java/org/apache/pylucene/queryParser/PythonQueryParser.java 
and see the native methods that you're missing on your Python 
implementation. Also take a look at test/test_PythonQueryParser.py for an 
example on what the new methods look like (hint: getFieldQuery_quoted()).


With a default QueryParser instance, your query parses just fine:

   from lucene import *
   initVM()
  jcc.JCCEnv object at 0x10029d0f0
   qp = QueryParser(Version.LUCENE_CURRENT, foo, 
StandardAnalyzer(Version.LUCENE_CURRENT))
   qp.parse(categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] 
AND NOT categories:RSSReader/_noexpire_)
  Query: +categories:rssreader +id:[0-00--000 TO 01299-51-3142-795] 
-(categories:rssreader categories:_noexpire_)

Andi..

On Wed, 6 Apr 2011, Bill Janssen wrote:


I'm seeing parse failures on this query string:

categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND NOT 
categories:RSSReader/_noexpire_

thr002: RSSReader:  Traceback (most recent call last):
thr002:   File /local/lib/UpLib-1.7.11/site-extensions/RSSReader.py, line 
271, in _scan_rss_sites
thr002: hits = repo.do_query(categories:RSSReader AND id:[0-00--000 TO 
%s] AND NOT categories:RSSReader/_noexpire_ % old_id)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1172, 
in do_query
thr002: results = self.do_full_query(query_string, searchtype)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1196, 
in do_full_query
thr002: results = self.pylucene_search(searchtype, query_string)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 1081, 
in pylucene_search
thr002: v = self.__search_context.search(query_string)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 913, in 
search
thr002: parsed_query = query_parser.parseQ(query)
thr002:   File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 550, in 
parseQ
thr002: query = QueryParser.parse(self, querystring)
thr002: JavaError: org.apache.jcc.PythonException: getFieldQuery_quoted
thr002: AttributeError: getFieldQuery_quoted

thr002: Java stacktrace:
thr002: org.apache.jcc.PythonException: getFieldQuery_quoted
thr002: AttributeError: getFieldQuery_quoted

thr002: at 
org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery_quoted(Native
 Method)
thr002: at 
org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery(Unknown
 Source)
thr002: at 
org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421)
thr002: at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309)
thr002: at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237)
thr002: at 
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
thr002: at 
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)


Bill



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Grant Ingersoll

On Apr 6, 2011, at 2:54 PM, Ryan McKinley wrote:

 I'm sorta confused about this (i'll probably offend someone here, but so be 
 it)
 
 Don't worry
 
 
 Its hard for me to tell, i hope the reason isn't elegance, but why aren't
 we working on making a simple,supported,80-20 case in lucene that
 non-spatial-gurus (and users) understand and can maintain...
 
 for me it is all about testing and development.
 
 For my needs I can't use the simple stuff, and *need* the features
 that many users won't care about.  I have not done any work on the
 existing spatial contrib because it does not meet my needs.
 
 The code can be separated so that the the dependencies are as you
 suggest -- i have done this, but it makes testing more difficult and
 less robust.  As part of the framework I've introduced a robust way to
 use the same data and and tests with different strategies and
 implementations.  For me to work on it, i need the stuff i use to be a
 first class citizen in testing.

I don't follow why testing is any harder.  The core interfaces and baseline 
implementation (along w/ point search) are tested here.  The JTS project does 
it's own tests.  You can certainly, on your machine, run the tests together.   
As I voted earlier, I think we should just define the interfaces here along w/ 
a baseline implementation that meets the 80/20 rule and the JTS project (or 
whatever else) lives somewhere else.  I just don't see any valid way to bring 
in a compile/test dependency on JTS that we can support as a first class 
citizen, but that doesn't mean we can't support the framework which makes it 
easy to drop in and test on an individual's machine.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-04-06 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016500#comment-13016500
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

So, long time silence .. :)

I've updated my github repo with a few things .. fixes and also a new 
threaddump-list, which was originally created by Upayavira, thanks!

Also started a new Wiki-Page [http://wiki.apache.org/solr/ReworkedSolrAdminGUI] 
- i'm not really good at marketing, so the Page is really basic and everybody 
is invited to update it.

As Upayavira stated last Tuesday there are still a few Things missing, compared 
to the current Admin-UI .. but i'd like to now: Which are the Features that 
*you* will need to give the reworked UI a try? One of the listed features? Or, 
will it be easier, if the Code would work from /solr/admin?

Please let me know -- Feedback is really appreciated :)

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 4.0


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 [This commit shows the 
 differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
  between old/existing index.jsp and my new one (which is could 
 copy-cut/paste'd from the existing one).
 Main Action takes place in 
 [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
  which is actually neither clean nor pretty .. just work-in-progress.
 Actually it's Work in Progress, so ... give it a try. It's developed with 
 Firefox as Browser, so, for a first impression .. please don't use _things_ 
 like Internet Explorer or so ;o
 Jan already suggested a bunch of good things, i'm sure there are more ideas 
 over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation

2011-04-06 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016502#comment-13016502
 ] 

Stefan Matheis (steffkes) commented on SOLR-2400:
-

I've checked out the current trunk-Revision .. but could not see any change on 
that, especially the raw-Term thing. Did i miss something else? Special Setting 
required for getting this property?

 FieldAnalysisRequestHandler; add information about token-relation
 -

 Key: SOLR-2400
 URL: https://issues.apache.org/jira/browse/SOLR-2400
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 
 110303_FieldAnalysisRequestHandler_view.png


 The XML-Output (simplified example attached) is missing one small information 
 .. which could be very useful to build an nice Analysis-Output, and that's 
 Token-Relation (if there is special/correct word for this, please correct 
 me).
 Meaning, that is actually not possible to follow the Analysis-Process 
 (completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord) 
 or split it into multiple Tokens (f.e. WordDelimiter).
 Would it be possible to include this Information? If so, it would be possible 
 to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) - 
 short scribble attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-04-06 Thread Nikola Tankovic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016504#comment-13016504
 ] 

Nikola Tankovic commented on LUCENE-2308:
-

Hi folks, 

I wrote an GSoC proposal for this issue, but I am missing a mentor for this 
issue. Any volonters? :)

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2459) LogLevelSelection Servlet outputs plain HTML

2011-04-06 Thread Stefan Matheis (steffkes) (JIRA)
LogLevelSelection Servlet outputs plain HTML


 Key: SOLR-2459
 URL: https://issues.apache.org/jira/browse/SOLR-2459
 Project: Solr
  Issue Type: Wish
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Trivial


The current available Output of the LogLevelSelection Servlet is plain HTML, 
which made it unpossible, to integrate the Logging-Information in the new 
Admin-UI. Format-Agnostic Output (like every [?] other Servlet offers) would be 
really nice!

Just as an Idea for a future structure, the new admin-ui is 
[https://github.com/steffkes/solr-admin/blob/master/logging.json|actually based 
on that json-structure] :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-04-06 Thread Nikola Tankovic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016522#comment-13016522
 ] 

Nikola Tankovic commented on LUCENE-2308:
-

I submitted first draft of my proposal (LUCENE-2308 Separately specify a 
field's type), hope you can see it and give me some further pointers if needed. 
Thank you!

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016527#comment-13016527
 ] 

Jan Høydahl commented on SOLR-2458:
---

It might not be a bug according to the original design intentions. But the 
first thing we tell users is to try out post.jar to post stuff, and now we've 
even included csv and json examples for it. Then it's unnecessary to get the 
error Solr returned an error #400 undefined field commit/ thrown in your 
face - the error does not even explain the problem.

I'll try to assemble a first patch for this next week some time, adding a 
separate POST with ?commit=true after the last file is POSTed.

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
  Labels: post.jar

 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Bill Janssen
Andi Vajda va...@apache.org wrote:

 
  Hi Bill,
 
 The QueryParser class changed a bit. More overloads were introduced on
 the Lucene side. You probably have a Python 'subclass' of QueryParser
 that needs a bit of work to adapt to the changes.

Thanks, but...   All that adds up to breakage for my users.

Bill


Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Earwin Burrfoot
On Wed, Apr 6, 2011 at 22:43, Robert Muir rcm...@gmail.com wrote:
 On Wed, Apr 6, 2011 at 2:12 PM, Ryan McKinley ryan...@gmail.com wrote:
 Some may be following the thread on spatial development...  here is a
 quick summary, and a poll to help decide what may be the best next
 move.

 I'm hoping to introduce a high level spatial API that can be used for
 a variety of indexing strategies and computational needs.  For simple
 point in BBox and point in WGS84 radius, this does not require any
 external libraries.  To support more complex queries -- point in
 polygon, complex geometry intersections, etc -- we need an LGPL
 library JTS.  The LGPL dependency is only needed to compile/test,
 there is no runtime requirement for JTS.  To enable the more
 complicated options you would need to add JTS to the classpath and
 perhaps set a environment variable.  This is essentially what we are
 now doing with the (soon to be removed) bdb contrib.

 I am trying to figure out the best home for this code and development
 to live.  I think it is essential for the JTS support to be part of
 the core build/test -- splitting it into a separate module that is
 tested elsewhere is not an option.  This raises the basic question of
 if people are willing to have the LGPL build dependency as part of the
 main lucene build.  I think it is, but am sympathetic to the idea that
 it might not be.

 I'm sorta confused about this (i'll probably offend someone here, but so be
 it)
 We have a contrib module for spatial that is experimental, people want to
 deprecate, and say has problems.
 Why must the super-expert-polygon stuff sit with the basic capability that
 probably most users want: the ability to do basic searches (probably in
 combination with text too) in their app?
 Its hard for me to tell, i hope the reason isn't elegance, but why aren't
 we working on making a simple,supported,80-20 case in lucene that
 non-spatial-gurus (and users) understand and can maintain... then it would
 seem ideal for the complex stuff to be outside of this project with any
 dependencies it wants?
 Users are probably really confused about the spatial situation: is it
 because we are floundering around this expert stuff

Handling Unicode code points outside of BMP is highly expert stuff as
well. And is totally unneeded by 80% of the users for any other reason
except elegance. I think you two guys can really understand each
other here : )

-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Robert Muir
On Wed, Apr 6, 2011 at 5:07 PM, Earwin Burrfoot ear...@gmail.com wrote:

 Handling Unicode code points outside of BMP is highly expert stuff as
 well. And is totally unneeded by 80% of the users for any other reason
 except elegance. I think you two guys can really understand each
 other here : )


you are wrong: you either support unicode, or your application is
buggy. Its not an optional feature, its the text standard used by the
java programming language.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Andi Vajda



On Wed, 6 Apr 2011, Bill Janssen wrote:


Andi Vajda va...@apache.org wrote:



 Hi Bill,

The QueryParser class changed a bit. More overloads were introduced on
the Lucene side. You probably have a Python 'subclass' of QueryParser
that needs a bit of work to adapt to the changes.


Thanks, but...   All that adds up to breakage for my users.


Unless I'm missing something here, you've got two options before you break 
your users:

  1. fix your code before you ship it to them
  2. don't upgrade

Yes, you could say that the same applies to PyLucene, of course :-)

I'm not exactly sure what kind of backwards compat promises Lucene Java made 
going from 3.0 to 3.1 but the new QueryParser method overloads and the fact 
that there is no support for method overloads in Python make 
PythonQueryParser a bit stuck between a rock and a hard place. If you see a 
better way to fix the mess with the _quoted and _slop variants for 
getFieldQuery, a patch is welcome.


Andi..


Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Bill Janssen
Andi Vajda va...@apache.org wrote:

 Look at the new version in
 apache/pylucene-3.1/java/org/apache/pylucene/queryParser/PythonQueryParser.java
 and see the native methods that you're missing on your Python

Wow, looks like a lot.  My implementations just have implementations of
getFieldQuery() and getRangeQuery().

 implementation. Also take a look at test/test_PythonQueryParser.py for
 an example on what the new methods look like (hint:
 getFieldQuery_quoted()).

Looking at that, it seems that one needn't provide implementations for
most of the native methods -- your example classes don't.  How should one
know which to implement?  The ones that could get called, I suppose.

Bill


 
 With a default QueryParser instance, your query parses just fine:
 
from lucene import *
initVM()
   jcc.JCCEnv object at 0x10029d0f0
qp = QueryParser(Version.LUCENE_CURRENT, foo, 
 StandardAnalyzer(Version.LUCENE_CURRENT))
qp.parse(categories:RSSReader AND id:[0-00--000 TO 
 01299-51-3142-795] AND NOT categories:RSSReader/_noexpire_)
   Query: +categories:rssreader +id:[0-00--000 TO 01299-51-3142-795] 
 -(categories:rssreader categories:_noexpire_)
 
 Andi..
 
 On Wed, 6 Apr 2011, Bill Janssen wrote:
 
  I'm seeing parse failures on this query string:
 
  categories:RSSReader AND id:[0-00--000 TO 01299-51-3142-795] AND 
  NOT categories:RSSReader/_noexpire_
 
  thr002: RSSReader:  Traceback (most recent call last):
  thr002:   File /local/lib/UpLib-1.7.11/site-extensions/RSSReader.py, line 
  271, in _scan_rss_sites
  thr002: hits = repo.do_query(categories:RSSReader AND 
  id:[0-00--000 TO %s] AND NOT categories:RSSReader/_noexpire_ % 
  old_id)
  thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 
  1172, in do_query
  thr002: results = self.do_full_query(query_string, searchtype)
  thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 
  1196, in do_full_query
  thr002: results = self.pylucene_search(searchtype, query_string)
  thr002:   File /local/share/UpLib-1.7.11/code/uplib/repository.py, line 
  1081, in pylucene_search
  thr002: v = self.__search_context.search(query_string)
  thr002:   File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 
  913, in search
  thr002: parsed_query = query_parser.parseQ(query)
  thr002:   File /local/share/UpLib-1.7.11/code/uplib/indexing.py, line 
  550, in parseQ
  thr002: query = QueryParser.parse(self, querystring)
  thr002: JavaError: org.apache.jcc.PythonException: getFieldQuery_quoted
  thr002: AttributeError: getFieldQuery_quoted
 
  thr002: Java stacktrace:
  thr002: org.apache.jcc.PythonException: getFieldQuery_quoted
  thr002: AttributeError: getFieldQuery_quoted
 
  thr002: at 
  org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery_quoted(Native
   Method)
  thr002: at 
  org.apache.pylucene.queryParser.PythonMultiFieldQueryParser.getFieldQuery(Unknown
   Source)
  thr002: at 
  org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421)
  thr002: at 
  org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309)
  thr002: at 
  org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237)
  thr002: at 
  org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
  thr002: at 
  org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
 
 
  Bill
 

repl: bad addresses:
pylucene-...@lucene.apache.org Andi Vajda va...@apache.org -- junk 
after local@domain (Andi)


Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Bill Bell
I love this idea.!

Bill Bell
Sent from mobile


On Apr 6, 2011, at 2:39 PM, Grant Ingersoll grant.ingers...@gmail.com wrote:

 
 I don't see why we need a compile/test dependency is needed at all:
 We provide a factory based spatial module where one specifies a 
 SpatialProvider.  We have our own implementation of that which works for some 
 set (or all) of the features.   An external project (Apache Extras?) could 
 then go and implement that provider using JTS and can easily leverage all of 
 our existing tests as well as it's own (using the handy-dandy test 
 framework).  Users who wish to use this would then simply include the 
 external JAR (accepting that it is LGPL on their own free will) and telling 
 L/S to use a different Provider.  I thought this is what you already 
 proposed.  This allows innovation on our stuff (which may well surpass JTS at 
 some point) as well as satisfies the short term win of JTS w/o violating ASF 
 legal issues (per http://www.apache.org/legal/3party.html#options-optional).  
 It would also make it easy for SIS to add it's own provider if and when it is 
 mature enough.
 
 -Grant
 
 
 On Apr 6, 2011, at 2:12 PM, Ryan McKinley wrote:
 
 [] OK with JTS compile dependency.  Spatial support should be a module
 [] OK with JTS, but think this spatial stuff should happen elsewhere
 [] Please, no LGPL dependencies in lucene build
 
 [x] Please no LGPL in Lucene build, please keep spatial framework here, 
 please implement JTS piece in Apache Extras per a well-defined (and hosted in 
 Lucene) SpatialProvider/Factory mechanism that is completely pluggable.  
 Compile dependency is in JTS needs Lucene spatial module, not Lucene spatial 
 module needs JTS.  :-)
 
 -Grant


Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Earwin Burrfoot
On Thu, Apr 7, 2011 at 01:11, Robert Muir rcm...@gmail.com wrote:
 On Wed, Apr 6, 2011 at 5:07 PM, Earwin Burrfoot ear...@gmail.com wrote:

 Handling Unicode code points outside of BMP is highly expert stuff as
 well. And is totally unneeded by 80% of the users for any other reason
 except elegance. I think you two guys can really understand each
 other here : )


 you are wrong: you either support unicode, or your application is
 buggy. Its not an optional feature, its the text standard used by the
 java programming language.

You either handle the the Earth as a proper somewhat-ellipsoid, or
your application is buggy. It's not an optional feature, it's even
stronger than a standard - it is a physical fact experienced by all of
us, earthlings.

Though 80% of the users can throw geoids and unicode planes out of the
window and live happily with some stupid local coordinate system and
two-byte characters (some even manage with one-byte!). Yeah, they
don't really care about being buggy in any geo/unicode-zealot's eyes.

Having said that, it's cool that people like you two exist :) Because
earth is round, maps are ugly, there are lots of different writing
systems and someone has to deal with that.

-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr-2242

2011-04-06 Thread Bill Bell
Solr-2242 inquiry. Who is going to help me get this committed? Issues?

Bill Bell
Sent from mobile


On Apr 6, 2011, at 2:51 AM, Jeffrey Chang jclal...@gmail.com wrote:

 Hi All,
  
 I'd like to start small and see how I can contribute to SOLR development.
  
 By following http://wiki.apache.org/solr/HowToContribute, I've created a new 
 defect (SOLR-2455) and created a patch for it.
  
 Not sure if I've done the right steps - can someone provide me some guidance 
 if I'm on the right track to make some contributions?
  
 I'm still confused on how the committers decide which patch to include the 
 fixes into. E.g. for the fixes I contribute, since I modified from Trunk, I'd 
 assume it goes to SOLR 4.0.x?
  
 Also, should I modify JIRA csae status to Resolve myself?
  
 Thanks,
 Jeff


Indexing Non-Textual Data

2011-04-06 Thread Chris Spencer
Hi,

I'm new to PyLucene, so forgive me if this is a newbie question. I have a
dataset composed of several thousand lists of 128 integer features, each
list associated with a class label. Would it be possible to use Lucene as a
classifier, by indexing the label with respect to these integer features,
and then classify a new list by finding the most similar labels with Lucene?

I've been going through the PyLucene samples, but they only seem to involve
indexing text, not continuous features (understandably). Could anyone point
me to an example that indexes non-textual data?

I think the project Lire (http://www.semanticmetadata.net/lire/) is using
Lucene to do something similar to this, although with an emphasis on image
features. I've dug into their code a little, but I'm not a strong Java
programmer, so I'm not sure how they're pulling it off, nor how I might
translate this into the PyLucene API. In your opinion, is this a practical
use of Lucene?

Regards,
Chris


Re: My GSOC proposal

2011-04-06 Thread Varun Thacker
I have drafted the proposal on the official GSoC website . This is the link
to my proposal http://goo.gl/uYXrV . Please do let me know if anything needs
to be changed ,added or removed.

I will keep on working on it till the deadline on the 8th.

On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 That test code looks good -- you really should have seen awful
 performance had you used O_DIRECT since you read byte by byte.

 A more realistic test is to read a whole buffer (eg 4 KB is what
 Lucene now uses during merging, but we'd probably up this to like 1 MB
 when using O_DIRECT).

 Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
 for good reason: its existence means projects like ours can use it to
 work around limitations in the Linux IO apis that control the buffer
 cache when, otherwise, we might conceivably make patches to fix Linux
 correctly.  It's an escape hatch, and we all use the escape hatch
 instead of trying to fix Linux for real...

 For example the NOREUSE flag is a no-op now in Linux, which is a
 shame, because that's precisely the flag we'd want to use for merging
 (along with SEQUENTIAL).  Had that flag been implemented well, it'd
 give better results than our workaround using O_DIRECT.

 Anyway, giving how things are, until we can get more control (wy
 up in Javaland) over the buffer cache, O_DIRECT (via native directory
 impl through JNI) is our only real option, today.

 More details here:
 http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html

 Note that other OSs likely do a better job and actually implement
 NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
 would simply use NOREUSE on these platforms for I/O during segment
 merging.

 Mike

 http://blog.mikemccandless.com

 On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
 varunthacker1...@gmail.com wrote:
  Hi. I wrote a sample code to test out speed difference between SEQUENTIAL
  and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .
 
  This is the link to the code: http://pastebin.com/8QywKGyS
 
  There was a speed difference which when i switched between the two flags.
 I
  have not used the O_DIRECT flag because Linus had criticized it.
 
  Is this what the flags are intended to be used for ? This is just a
 sample
  code with a test file .
 
  On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
  simon.willna...@googlemail.com wrote:
  Hey Varun,
  On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
  Hi Varun,
 
  Those two issues would make a great GSoC!  Comments below...
  +1
 
  On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
  varunthacker1...@gmail.com wrote:
 
  I would like to combine two tasks as part of my project
  namely-Directory createOutput and openInput should take an IOContext
  (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
  UnixDir (Lucene-2795).
 
  The first part of the project is aimed at significantly reducing time
  taken to search during indexing by adding an IOContext which would
  store buffer size and have options to bypass the OS’s buffer cache
  (This is what causes the slowdown in search ) and other hints. Once
  completed I would move on to Lucene-2795 and generalize the Directory
  implementation to make a UnixDirectory .
 
  So, the first part (LUCENE-2793) should cause no change at all to
  performance, functionality, etc., because it's merely installing the
  plumbing (IOContext threaded throughout the low-level store APIs in
  Lucene) so that higher levels can send important details down to the
  Directory.  We'd fix IndexWriter/IndexReader to fill out this
  IOContext with the details (merging, flushing, new reader, etc.).
 
  There's some fun/freedom here in figuring out just what details should
  be included in IOContext... (eg: is it low level set buffer size to 4
  KB
  or is it high level I am opening a new near-real-time reader).
 
  This first step is a rote cutover, just changing APIs but in no way
  taking advantage of the new APIs.
 
  The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
  by creating a UnixDir impl that, using JNI (C code), passes advanced
  flags when opening files, based on the incoming IOContext.
 
  The goal is a single UnixDir that has ifdefs so that it's usable
  across multiple Unices, and eg would use direct IO if the context is
  merging.  If we are ambitious we could rope Windows into the mix, too,
  and then this would be NativeDir...
 
  We can measure success by validating that a big merge while searching
  does not hurt search performance?  (Ie we should be able to reproduce
  the results from
  http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
 ).
 
  Thanks for the summary mike!
 
  I have spoken to Micheal McCandless and Simon Willnauer about
  undertaking these tasks. Micheal McCandless has agreed to mentor me .
  I would love to be able to contribute 

[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-04-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016571#comment-13016571
 ] 

Simon Willnauer commented on LUCENE-2308:
-

bq. I wrote an GSoC proposal for this issue, but I am missing a mentor for this 
issue. Any volonters? 
don't worry we will find somebody to mentor! 

bq. I submitted first draft of my proposal (LUCENE-2308 Separately specify a 
field's type), hope you can see it and give me some further pointers if needed. 

yep I can see it - looks good so far.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing Non-Textual Data

2011-04-06 Thread Andi Vajda


 Hi,

On Wed, 6 Apr 2011, Chris Spencer wrote:


I'm new to PyLucene, so forgive me if this is a newbie question. I have a
dataset composed of several thousand lists of 128 integer features, each
list associated with a class label. Would it be possible to use Lucene as a
classifier, by indexing the label with respect to these integer features,
and then classify a new list by finding the most similar labels with Lucene?


I believe there is support in Lucene for indexing numeric values using a 
Trie. Please ask on java-u...@lucene.apache.org (subscribe first by sending 
mail to jave-user-subscr...@lucene.apache.org). There are many more Lucene 
experts with answers there.


For example, this class may be relevant:
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/document/NumericField.html

Andi..



I've been going through the PyLucene samples, but they only seem to involve
indexing text, not continuous features (understandably). Could anyone point
me to an example that indexes non-textual data?

I think the project Lire (http://www.semanticmetadata.net/lire/) is using
Lucene to do something similar to this, although with an emphasis on image
features. I've dug into their code a little, but I'm not a strong Java
programmer, so I'm not sure how they're pulling it off, nor how I might
translate this into the PyLucene API. In your opinion, is this a practical
use of Lucene?

Regards,
Chris



[jira] [Assigned] (LUCENE-2308) Separately specify a field's type

2011-04-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2308:
--

Assignee: Michael McCandless

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016582#comment-13016582
 ] 

Michael McCandless commented on LUCENE-2308:


Hi Nikola, I'd be happy to mentor for this issue!  Your proposal looks great.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Bill Janssen
Andi Vajda va...@apache.org wrote:

 Unless I'm missing something here, you've got two options before you
 break your users:
   1. fix your code before you ship it to them

Unfortunately, the code is out there for building, and the instructions,
also already out there, say, PyLucene 2.4 to 3.X.  I should be more
careful :-).

   2. don't upgrade

It's the users that upgrade, not me.

 Yes, you could say that the same applies to PyLucene, of course :-)

:-)

 I'm not exactly sure what kind of backwards compat promises Lucene
 Java made going from 3.0 to 3.1 but the new QueryParser method
 overloads and the fact that there is no support for method overloads
 in Python make PythonQueryParser a bit stuck between a rock and a hard
 place. If you see a better way to fix the mess with the _quoted and
 _slop variants for getFieldQuery, a patch is welcome.

Sure.

Bill


Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Andi Vajda


On Wed, 6 Apr 2011, Bill Janssen wrote:


Andi Vajda va...@apache.org wrote:


Unless I'm missing something here, you've got two options before you
break your users:
  1. fix your code before you ship it to them


Unfortunately, the code is out there for building, and the instructions,
also already out there, say, PyLucene 2.4 to 3.X.  I should be more
careful :-).


Given that APIs changed quite a bit between 2.x and 3.0 and that 
2.x deprecated APIs are removed from 3.1+ (unless I'm confused about 
Lucene's deprecation policy (*)), your statement is a bit optimistic.


(*) maybe it's not until 4.0 that they're going to be removed ? I can't
remember at the moment. Mike, if you read this, can you please correct
me if I'm wrong ?

Andi..




  2. don't upgrade


It's the users that upgrade, not me.


Yes, you could say that the same applies to PyLucene, of course :-)


:-)


I'm not exactly sure what kind of backwards compat promises Lucene
Java made going from 3.0 to 3.1 but the new QueryParser method
overloads and the fact that there is no support for method overloads
in Python make PythonQueryParser a bit stuck between a rock and a hard
place. If you see a better way to fix the mess with the _quoted and
_slop variants for getFieldQuery, a patch is welcome.


Sure.

Bill



Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Michael McCandless
On Wed, Apr 6, 2011 at 6:38 PM, Andi Vajda va...@apache.org wrote:

 On Wed, 6 Apr 2011, Bill Janssen wrote:

 Andi Vajda va...@apache.org wrote:

 Unless I'm missing something here, you've got two options before you
 break your users:
  1. fix your code before you ship it to them

 Unfortunately, the code is out there for building, and the instructions,
 also already out there, say, PyLucene 2.4 to 3.X.  I should be more
 careful :-).

 Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x
 deprecated APIs are removed from 3.1+ (unless I'm confused about Lucene's
 deprecation policy (*)), your statement is a bit optimistic.

 (*) maybe it's not until 4.0 that they're going to be removed ? I can't
    remember at the moment. Mike, if you read this, can you please correct
    me if I'm wrong ?

Actually, any API deprecated in any Lucene 2.x release is removed in
3.0.  (Same for 3.x to 4.0, etc.).

Mike

http://blog.mikemccandless.com


Re: [VOTE] Release PyLucene 3.1.0

2011-04-06 Thread Andi Vajda



On Wed, 6 Apr 2011, Michael McCandless wrote:


On Wed, Apr 6, 2011 at 6:38 PM, Andi Vajda va...@apache.org wrote:


On Wed, 6 Apr 2011, Bill Janssen wrote:


Andi Vajda va...@apache.org wrote:


Unless I'm missing something here, you've got two options before you
break your users:
 1. fix your code before you ship it to them


Unfortunately, the code is out there for building, and the instructions,
also already out there, say, PyLucene 2.4 to 3.X.  I should be more
careful :-).


Given that APIs changed quite a bit between 2.x and 3.0 and that 2.x
deprecated APIs are removed from 3.1+ (unless I'm confused about Lucene's
deprecation policy (*)), your statement is a bit optimistic.

(*) maybe it's not until 4.0 that they're going to be removed ? I can't
   remember at the moment. Mike, if you read this, can you please correct
   me if I'm wrong ?


Actually, any API deprecated in any Lucene 2.x release is removed in
3.0.  (Same for 3.x to 4.0, etc.).


Ah, I thought 3.0 had them both. Ok, duly noted.
Thanks !

Andi..

[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-04-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2959:


Assignee: Robert Muir

setting myself as assignee as i'd like to mentor this one.

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Examples, Javadocs, Query/Scoring
Reporter: David Mark Nemeskey
Assignee: Robert Muir
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, 
 proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: My GSOC proposal

2011-04-06 Thread Adriano Crestani
Hi Varun,

Nice proposal, very complete. Only one thing missing, you should mention
somewhere how many hours a week you are willing to spend working on the
project and whether there is any holiday you won't be able to work.

Good luck ;)

On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker varunthacker1...@gmail.comwrote:

 I have drafted the proposal on the official GSoC website . This is the link
 to my proposal http://goo.gl/uYXrV . Please do let me know if anything
 needs to be changed ,added or removed.

 I will keep on working on it till the deadline on the 8th.

 On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 That test code looks good -- you really should have seen awful
 performance had you used O_DIRECT since you read byte by byte.

 A more realistic test is to read a whole buffer (eg 4 KB is what
 Lucene now uses during merging, but we'd probably up this to like 1 MB
 when using O_DIRECT).

 Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
 for good reason: its existence means projects like ours can use it to
 work around limitations in the Linux IO apis that control the buffer
 cache when, otherwise, we might conceivably make patches to fix Linux
 correctly.  It's an escape hatch, and we all use the escape hatch
 instead of trying to fix Linux for real...

 For example the NOREUSE flag is a no-op now in Linux, which is a
 shame, because that's precisely the flag we'd want to use for merging
 (along with SEQUENTIAL).  Had that flag been implemented well, it'd
 give better results than our workaround using O_DIRECT.

 Anyway, giving how things are, until we can get more control (wy
 up in Javaland) over the buffer cache, O_DIRECT (via native directory
 impl through JNI) is our only real option, today.

 More details here:
 http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html

 Note that other OSs likely do a better job and actually implement
 NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
 would simply use NOREUSE on these platforms for I/O during segment
 merging.

 Mike

 http://blog.mikemccandless.com

 On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
 varunthacker1...@gmail.com wrote:
  Hi. I wrote a sample code to test out speed difference between
 SEQUENTIAL
  and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .
 
  This is the link to the code: http://pastebin.com/8QywKGyS
 
  There was a speed difference which when i switched between the two
 flags. I
  have not used the O_DIRECT flag because Linus had criticized it.
 
  Is this what the flags are intended to be used for ? This is just a
 sample
  code with a test file .
 
  On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
  simon.willna...@googlemail.com wrote:
  Hey Varun,
  On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
  Hi Varun,
 
  Those two issues would make a great GSoC!  Comments below...
  +1
 
  On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
  varunthacker1...@gmail.com wrote:
 
  I would like to combine two tasks as part of my project
  namely-Directory createOutput and openInput should take an IOContext
  (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
  UnixDir (Lucene-2795).
 
  The first part of the project is aimed at significantly reducing time
  taken to search during indexing by adding an IOContext which would
  store buffer size and have options to bypass the OS’s buffer cache
  (This is what causes the slowdown in search ) and other hints. Once
  completed I would move on to Lucene-2795 and generalize the Directory
  implementation to make a UnixDirectory .
 
  So, the first part (LUCENE-2793) should cause no change at all to
  performance, functionality, etc., because it's merely installing the
  plumbing (IOContext threaded throughout the low-level store APIs in
  Lucene) so that higher levels can send important details down to the
  Directory.  We'd fix IndexWriter/IndexReader to fill out this
  IOContext with the details (merging, flushing, new reader, etc.).
 
  There's some fun/freedom here in figuring out just what details should
  be included in IOContext... (eg: is it low level set buffer size to 4
  KB
  or is it high level I am opening a new near-real-time reader).
 
  This first step is a rote cutover, just changing APIs but in no way
  taking advantage of the new APIs.
 
  The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
  by creating a UnixDir impl that, using JNI (C code), passes advanced
  flags when opening files, based on the incoming IOContext.
 
  The goal is a single UnixDir that has ifdefs so that it's usable
  across multiple Unices, and eg would use direct IO if the context is
  merging.  If we are ambitious we could rope Windows into the mix, too,
  and then this would be NativeDir...
 
  We can measure success by validating that a big merge while searching
  does not hurt search performance?  (Ie we should be 

GSoC Lucene proposals

2011-04-06 Thread Adriano Crestani
Hi students,

We are receiving very good proposals this year, I am sure mentors are very
happy :)

I have one suggestion to make our (mentors) lives easier. Please, add the
JIRA identifier to your proposal's title, example: LUCENE-2883: Consolidate
Solr  Lucene FunctionQuery into modules. This will let mentors to quickly
search for Lucene and Solr proposals, as all Apache proposals are mixed and
there is no way to sort by project.

Thanks!

--
Adriano Crestani


Re: GSoC Lucene proposals

2011-04-06 Thread Vinicius Barrox
Done!

--- Em qua, 6/4/11, Adriano Crestani adrianocrest...@apache.org escreveu:

De: Adriano Crestani adrianocrest...@apache.org
Assunto: GSoC Lucene proposals
Para: dev@lucene.apache.org
Data: Quarta-feira, 6 de Abril de 2011, 22:43

Hi students,
We are receiving very good proposals this year, I am sure mentors are very 
happy :)
I have one suggestion to make our (mentors) lives easier. Please, add the JIRA 
identifier to your proposal's title, example: LUCENE-2883: Consolidate Solr  
Lucene FunctionQuery into modules. This will let mentors to quickly search for 
Lucene and Solr proposals, as all Apache proposals are mixed and there is no 
way to sort by project.


Thanks!
--Adriano Crestani


[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser

2011-04-06 Thread Vinicius Barros (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016653#comment-13016653
 ] 

Vinicius Barros commented on LUCENE-1768:
-

Thanks for reviewing it Adriano. I updated the proposal to clarify it's the 
contrib query parser.

 NumericRange support for new query parser
 -

 Key: LUCENE-1768
 URL: https://issues.apache.org/jira/browse/LUCENE-1768
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Adriano Crestani
  Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0


 It would be good to specify some type of schema for the query parser in 
 future, to automatically create NumericRangeQuery for different numeric 
 types? It would then be possible to index a numeric value 
 (double,float,long,int) using NumericField and then the query parser knows, 
 which type of field this is and so it correctly creates a NumericRangeQuery 
 for strings like [1.567..*] or (1.787..19.5].
 There is currently no way to extract if a field is numeric from the index, so 
 the user will have to configure the FieldConfig objects in the ConfigHandler. 
 But if this is done, it will not be that difficult to implement the rest.
 The only difference between the current handling of RangeQuery is then the 
 instantiation of the correct Query type and conversion of the entered numeric 
 values (simple Number.valueOf(...) cast of the user entered numbers). 
 Evenerything else is identical, NumericRangeQuery also supports the MTQ 
 rewrite modes (as it is a MTQ).
 Another thing is a change in Date semantics. There are some strange flags in 
 the current parser that tells it how to handle dates.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: My GSOC proposal

2011-04-06 Thread Varun Thacker
I have updated my proposal online to mention the time I would be able to
dedicate to the project .

On Thu, Apr 7, 2011 at 7:05 AM, Adriano Crestani
adrianocrest...@gmail.comwrote:

 Hi Varun,

 Nice proposal, very complete. Only one thing missing, you should mention
 somewhere how many hours a week you are willing to spend working on the
 project and whether there is any holiday you won't be able to work.

 Good luck ;)


 On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker 
 varunthacker1...@gmail.comwrote:

 I have drafted the proposal on the official GSoC website . This is the
 link to my proposal http://goo.gl/uYXrV . Please do let me know if
 anything needs to be changed ,added or removed.

 I will keep on working on it till the deadline on the 8th.

 On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 That test code looks good -- you really should have seen awful
 performance had you used O_DIRECT since you read byte by byte.

 A more realistic test is to read a whole buffer (eg 4 KB is what
 Lucene now uses during merging, but we'd probably up this to like 1 MB
 when using O_DIRECT).

 Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and
 for good reason: its existence means projects like ours can use it to
 work around limitations in the Linux IO apis that control the buffer
 cache when, otherwise, we might conceivably make patches to fix Linux
 correctly.  It's an escape hatch, and we all use the escape hatch
 instead of trying to fix Linux for real...

 For example the NOREUSE flag is a no-op now in Linux, which is a
 shame, because that's precisely the flag we'd want to use for merging
 (along with SEQUENTIAL).  Had that flag been implemented well, it'd
 give better results than our workaround using O_DIRECT.

 Anyway, giving how things are, until we can get more control (wy
 up in Javaland) over the buffer cache, O_DIRECT (via native directory
 impl through JNI) is our only real option, today.

 More details here:
 http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html

 Note that other OSs likely do a better job and actually implement
 NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory
 would simply use NOREUSE on these platforms for I/O during segment
 merging.

 Mike

 http://blog.mikemccandless.com

 On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker
  varunthacker1...@gmail.com wrote:
  Hi. I wrote a sample code to test out speed difference between
 SEQUENTIAL
  and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .
 
  This is the link to the code: http://pastebin.com/8QywKGyS
 
  There was a speed difference which when i switched between the two
 flags. I
  have not used the O_DIRECT flag because Linus had criticized it.
 
  Is this what the flags are intended to be used for ? This is just a
 sample
  code with a test file .
 
  On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer
  simon.willna...@googlemail.com wrote:
  Hey Varun,
  On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
  Hi Varun,
 
  Those two issues would make a great GSoC!  Comments below...
  +1
 
  On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
  varunthacker1...@gmail.com wrote:
 
  I would like to combine two tasks as part of my project
  namely-Directory createOutput and openInput should take an IOContext
  (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
  UnixDir (Lucene-2795).
 
  The first part of the project is aimed at significantly reducing
 time
  taken to search during indexing by adding an IOContext which would
  store buffer size and have options to bypass the OS’s buffer cache
  (This is what causes the slowdown in search ) and other hints. Once
  completed I would move on to Lucene-2795 and generalize the
 Directory
  implementation to make a UnixDirectory .
 
  So, the first part (LUCENE-2793) should cause no change at all to
  performance, functionality, etc., because it's merely installing
 the
  plumbing (IOContext threaded throughout the low-level store APIs in
  Lucene) so that higher levels can send important details down to the
  Directory.  We'd fix IndexWriter/IndexReader to fill out this
  IOContext with the details (merging, flushing, new reader, etc.).
 
  There's some fun/freedom here in figuring out just what details
 should
  be included in IOContext... (eg: is it low level set buffer size to
 4
  KB
  or is it high level I am opening a new near-real-time reader).
 
  This first step is a rote cutover, just changing APIs but in no way
  taking advantage of the new APIs.
 
  The 2nd step (LUCENE-2795) would then take advantage of this
 plumbing,
  by creating a UnixDir impl that, using JNI (C code), passes advanced
  flags when opening files, based on the incoming IOContext.
 
  The goal is a single UnixDir that has ifdefs so that it's usable
  across multiple Unices, and eg would use direct IO if the context is
  merging.  If we are ambitious we could