Re: Pylucene release

2012-11-29 Thread Andi Vajda

On Nov 29, 2012, at 5:37, Shawn Grant shawn.gr...@orcatec.com wrote:

 Hi Andi, thanks for the explanation.
 
 The main problem I've come across so far is that it looks like the main 
 branch lucene has a lucene41 codec in it that does not appear to be part of 
 the 4.0 release and (I think) is causing problems creating and/or retrieving 
 term vectors.  I'm not a lucene expert and it's been hard to diagnose.  I 
 also can't use Luke due to the codec.

PyLucene trunk is currently tracking Lucene's branch 4.x. I'd expect the 
lucene41 codec to be available there.

 I tried to set the default codec to lucene40 but then my index writer 
 complained that lucene40 was only for reading.

You should ask on the lucene-user@ list. There are more people listening there 
who would know the details.

 I'll try to contribute to porting the unit tests to help move the release 
 along.

Cool !

Andi..

 
 On 11/13/2012 02:18 PM, Andi Vajda wrote:
 
 Hi Shawn,
 
 On Tue, 13 Nov 2012, Shawn Grant wrote:
 
 Hi Andi, I was just wondering if Pylucene is on its usual schedule to 
 release 4-6 weeks after Lucene.  I didn't see any discussion of it on the 
 mailing list or elsewhere.  I'm looking forward to 4.0!
 
 Normally, PyLucene is released a few days after a Lucene release but 4.0 has 
 seen so many API changes and removals that all tests and samples need to be 
 ported to the new API. Last week-end, I ported a few but lots remain to be.
 
 If no one helps, it either means that no one cares enough or that everyone 
 is willing to be patient :-)
 
 The PyLucene trunk svn repository is currently tracking the Lucene Core 4.x 
 branch and you're welcome to use it out of svn. In the ten or so unit tests 
 I ported so far, I didn't find any issues with PyLucene proper (or JCC). All 
 changes were due to the tests being out of date or using deprecated APIs now 
 removed. You might find that PyLucene out-of-trunk is quite usable.
 
 If people want to help with porting PyLucene unit tests, the ones under its 
 'test' directory not yet ported, feel free to ask questions here.
 The gist of it is:
  - fix the imports (look at the first few tests for example,
alphabetically)
  - fix the tests to pass by looking at the original Java tests for changes
as most of these tests were originally ported from Java Lucene.
 
 Once you're familiar with the new APIs, porting the sample code in samples 
 and in LuceneInAction should fairly straightforward. It's just that there is 
 a lot to port.
 
 Andi..


[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-11-29 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506317#comment-13506317
 ] 

Commit Tag Bot commented on LUCENE-4345:


[trunk commit] Uwe Schindler
http://svn.apache.org/viewvc?view=revisionrevision=1415074

LUCENE-4345: Fix forbidden APIs and make the test more predicatable



 Create a Classification module
 --

 Key: LUCENE-4345
 URL: https://issues.apache.org/jira/browse/LUCENE-4345
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
 Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, 
 SOLR-3700_2.patch, SOLR-3700.patch


 Lucene/Solr can host huge sets of documents containing lots of information in 
 fields so that these can be used as training examples (w/ features) in order 
 to very quickly create classifiers algorithms to use on new documents and / 
 or to provide an additional service.
 So the idea is to create a contrib module (called 'classification') to host a 
 ClassificationComponent that will use already seen data (the indexed 
 documents / fields) to classify new documents / text fragments.
 The first version will contain a (simplistic) Lucene based Naive Bayes 
 classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_37) - Build # 2938 - Still Failing!

2012-11-29 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/2938/
Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 13789 lines...]
-check-forbidden-test-apis:
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/tests.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:65)
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:70)
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:71)
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:71)
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:71)
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:73)
[forbidden-apis] Forbidden method invocation: java.util.Random#init()
[forbidden-apis]   in org.apache.lucene.classification.utils.DataSplitterTest 
(DataSplitterTest.java:111)
[forbidden-apis] Scanned 2157 (and 215 related) class file(s) for forbidden API 
invocations (in 2.10s), 7 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:69: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:174: Check 
for forbidden API calls failed, see log.

Total time: 20 minutes 57 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4575:
---

Attachment: LUCENE-4575.patch

Patch adds setCommitData to IndexWriter and increase changeCount as well as 
sets that commitData on segmentInfos. It also adds a test to verify the 
behavior.

Regarding back-compat - I prefer to nuke commit(data) and prepcommit(data), in 
exchange for this API, for both trunk and 4.x.

This patch however supports the old commit/prepcommit(data) API, but I think 
that it will be simpler if we just nuke these API. The migration to the new API 
is a no-brainer, just call setCommitData before your commit().

I don't intend to commit it yet, depending on how we decide to handle 
back-compat. If we decide to keep the back-compat support, I want to move the 
commit(data) and prepCommit(data) impls to their respective no-data versions, 
and then have these API deprecated and call setCommitData() followed by the 
respective no-data version.

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3668) offsets issues with multiword synonyms

2012-11-29 Thread Okke Klein (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506378#comment-13506378
 ] 

Okke Klein commented on LUCENE-3668:


Doesn't work for me either in Solr4. Can we revisit this issue?

 offsets issues with multiword synonyms
 --

 Key: LUCENE-3668
 URL: https://issues.apache.org/jira/browse/LUCENE-3668
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 3.6, 4.0-ALPHA

 Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch


 as reported on the list, there are some strange offsets with FSTSynonyms, in 
 the case of multiword synonyms.
 as a workaround it was suggested to use the older synonym impl, but it has 
 bugs too (just in a different way).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-29 Thread Lukas Graf (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506392#comment-13506392
 ] 

Lukas Graf commented on SOLR-139:
-

Ok, I finally figured it out by diffing every single difference from my test 
case to the stock Solr 4.0 example using _git bisect_.

The culprit was a missing *updateLog /* directive in _solrconfig.xml_. As 
soon as I configured a transaction log, atomic updates worked as expected. I 
added a note about this at 
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
 .

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3484 - Failure

2012-11-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3484/

All tests passed

Build Log:
[...truncated 20084 lines...]
-documentation-lint:
 [echo] checking for broken html...
[jtidy] Checking for broken html (such as invalid tags)...
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/jtidy_tmp
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] build/docs/classification/overview-summary.html
 [exec]   missing: org.apache.lucene.classification.utils
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/utils/package-summary.html
 [exec]   no package description (missing package.html in src?)
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:62:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:245:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1944:
 exec returned: 1

Total time: 24 minutes 0 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_09) - Build # 2942 - Failure!

2012-11-29 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/2942/
Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 20067 lines...]
-documentation-lint:
 [echo] checking for broken html...
[jtidy] Checking for broken html (such as invalid tags)...
   [delete] Deleting directory 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/jtidy_tmp
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] build/docs/classification/overview-summary.html
 [exec]   missing: org.apache.lucene.classification.utils
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/utils/package-summary.html
 [exec]   no package description (missing package.html in src?)
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:62: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:245: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1944:
 exec returned: 1

Total time: 20 minutes 14 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506432#comment-13506432
 ] 

Michael McCandless commented on LUCENE-4575:


+1 to do a hard break; this is expert.


 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-3668) offsets issues with multiword synonyms

2012-11-29 Thread Okke Klein (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506378#comment-13506378
 ] 

Okke Klein edited comment on LUCENE-3668 at 11/29/12 12:36 PM:
---

Doesn't work for me either in Solr4. Can we revisit this issue?

Perhaps this http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ 
can give some insight/help?

  was (Author: okkeklein):
Doesn't work for me either in Solr4. Can we revisit this issue?
  
 offsets issues with multiword synonyms
 --

 Key: LUCENE-3668
 URL: https://issues.apache.org/jira/browse/LUCENE-3668
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 3.6, 4.0-ALPHA

 Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch


 as reported on the list, there are some strange offsets with FSTSynonyms, in 
 the case of multiword synonyms.
 as a workaround it was suggested to use the older synonym impl, but it has 
 bugs too (just in a different way).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4120) Collection API: Support for specifying a list of solrs to spread a new collection across

2012-11-29 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4120:
-

Attachment: SOLR-4120.patch

h4. SOLR-4120.patch

h5. Where does it fit

* It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch 
for SOLR-4114 has already been applied. The following should work if you have a 
checkout of revision 1412602 of branch lucene_solr_4_0
** cd checkout-folder
** patch -s -p0  SOLR-4114.patch
** patch --ignore-whitespace -p0  SOLR-4120.patch

You need the --ignore-whitespace - at least with my version of patch on Show 
Leopard. Probably because I do not have the correct Solr code-style installed 
in my Eclipse. Hmmm, probably should do that.

h5. Content of the patch

The patch modifies the create operation of the Solr Collection API, so that i 
allows to provide a list of Solrs that the shards for the new collection should 
be spread across
* Param key: createNodeSet
* Param value: comma-separated list of node-names (equal to the node-names 
received from ClusterState.getLiveNodes())
* Param is not mandatory. If not provided the created collection will still 
have its shards spread across all live nodes

h5. Testing 

BasicDistributedZkTest.testCollectionAPI has been modified to also test this 
feature

 Collection API: Support for specifying a list of solrs to spread a new 
 collection across
 

 Key: SOLR-4120
 URL: https://issues.apache.org/jira/browse/SOLR-4120
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Per Steffensen
Priority: Minor
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4120.patch


 When creating a new collection through the Collection API, the Overseer 
 (handling the creation) will spread shards for this new collection across all 
 live nodes.
 Sometimes you dont want a collection spread across all available nodes. Allow 
 for the create operation of the Collection API, to take a createNodeSet 
 parameter containing a list of Solr to spread the new shards across. If not 
 provided it will just spread across all available nodes (default).
 For an example of a concrete case of usage see: 
 https://issues.apache.org/jira/browse/SOLR-4114?focusedCommentId=13505506page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505506

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4120) Collection API: Support for specifying a list of solrs to spread a new collection across

2012-11-29 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506444#comment-13506444
 ] 

Per Steffensen edited comment on SOLR-4120 at 11/29/12 12:56 PM:
-

h4. SOLR-4120.patch

h5. Where does it fit

* It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch 
for SOLR-4114 has already been applied. The following should work if you have a 
checkout of revision 1412602 of branch lucene_solr_4_0
** cd checkout-folder
** patch -s -p0  SOLR-4114.patch
** patch --ignore-whitespace -p0  SOLR-4120.patch

You need the --ignore-whitespace - at least with my version of patch on Show 
Leopard. Probably because I do not have the correct Solr code-style installed 
in my Eclipse. Hmmm, probably should do that.

h5. Content of the patch

The patch modifies the create operation of the Solr Collection API, so that i 
allows to provide a list of Solrs that the shards for the new collection should 
be spread across
* Param key: createNodeSet (OverseerCollectionProcessor.CREATE_NODE_SET)
* Param value: comma-separated list of node-names (equal to the node-names 
received from ClusterState.getLiveNodes())
* Param is not mandatory. If not provided the created collection will still 
have its shards spread across all live nodes

h5. Testing 

BasicDistributedZkTest.testCollectionAPI has been modified to also test this 
feature

  was (Author: steff1193):
h4. SOLR-4120.patch

h5. Where does it fit

* It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch 
for SOLR-4114 has already been applied. The following should work if you have a 
checkout of revision 1412602 of branch lucene_solr_4_0
** cd checkout-folder
** patch -s -p0  SOLR-4114.patch
** patch --ignore-whitespace -p0  SOLR-4120.patch

You need the --ignore-whitespace - at least with my version of patch on Show 
Leopard. Probably because I do not have the correct Solr code-style installed 
in my Eclipse. Hmmm, probably should do that.

h5. Content of the patch

The patch modifies the create operation of the Solr Collection API, so that i 
allows to provide a list of Solrs that the shards for the new collection should 
be spread across
* Param key: createNodeSet
* Param value: comma-separated list of node-names (equal to the node-names 
received from ClusterState.getLiveNodes())
* Param is not mandatory. If not provided the created collection will still 
have its shards spread across all live nodes

h5. Testing 

BasicDistributedZkTest.testCollectionAPI has been modified to also test this 
feature
  
 Collection API: Support for specifying a list of solrs to spread a new 
 collection across
 

 Key: SOLR-4120
 URL: https://issues.apache.org/jira/browse/SOLR-4120
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Per Steffensen
Priority: Minor
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4120.patch


 When creating a new collection through the Collection API, the Overseer 
 (handling the creation) will spread shards for this new collection across all 
 live nodes.
 Sometimes you dont want a collection spread across all available nodes. Allow 
 for the create operation of the Collection API, to take a createNodeSet 
 parameter containing a list of Solr to spread the new shards across. If not 
 provided it will just spread across all available nodes (default).
 For an example of a concrete case of usage see: 
 https://issues.apache.org/jira/browse/SOLR-4114?focusedCommentId=13505506page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505506

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506468#comment-13506468
 ] 

Shai Erera commented on LUCENE-4575:


Thanks. I forgot to mention two things about the changes in the patch, which I 
wasn't sure about:

# I currently copy the commitData map on setCommitData. It seems safe to do it, 
and I don't think commitData are huge. Any objections?
# I set pass the copied map directly to segmentInfos, rather than saving it in 
a member in IW. Do you see any issues with it? (I'm thinking about rollback, 
even though we have another copy of the segmentInfos for rollback purposes ...)

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4124) You should be able to set the update log directory with the CoreAdmin API the same way as the data directory.

2012-11-29 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4124:
--

Attachment: SOLR-4124.patch

First cut at a patch.

 You should be able to set the update log directory with the CoreAdmin API the 
 same way as the data directory.
 -

 Key: SOLR-4124
 URL: https://issues.apache.org/jira/browse/SOLR-4124
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: SOLR-4124.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-11-29 Thread Roman Slavik (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506475#comment-13506475
 ] 

Roman Slavik commented on SOLR-1604:


Hi, I downloaded last version of ComplexPhrase (24/Oct/12 02:30) but have 
problem with junit test. Here is error log :
{noformat}
test(org.apache.solr.search.ComplexPhraseQParserPluginTest)  Time elapsed: 
0.191 sec   ERROR!
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([4BF35CC9C13F3B15:C3A763136FC356ED]:0)
at 
org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:283)
at 
org.apache.solr.search.ComplexPhraseQParserPluginTest.test(ComplexPhraseQParserPluginTest.java:158)
// nothing interest here
Caused by: java.lang.IllegalArgumentException: Unknown query type 
org.apache.lucene.search.ConstantScoreQuery found in phrase query string jo* 
[sma TO smz]
at 
org.apache.lucene.queryparser.classic.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:297)
at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:599)
at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:646)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1385)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1260)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at org.apache.solr.util.TestHarness.query(TestHarness.java:364)
at org.apache.solr.util.TestHarness.query(TestHarness.java:346)
at 
org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:273)
... 41 more
{noformat}
Is it error on my side (I didn't change anything)? Or some crucial error?

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhraseQueryParser.java, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3485 - Still Failing

2012-11-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3485/

All tests passed

Build Log:
[...truncated 20087 lines...]
-documentation-lint:
 [echo] checking for broken html...
[jtidy] Checking for broken html (such as invalid tags)...
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/jtidy_tmp
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/classification/org/apache/lucene/classification/utils/DatasetSplitter.html
 [exec]   missing Constructors: DatasetSplitter(double, double)
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:62:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:259:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1944:
 exec returned: 1

Total time: 24 minutes 47 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Active 4.x branches?

2012-11-29 Thread Yonik Seeley
On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
 Maybe we should have a
 roster somewhere of parts of the codebase that have an owner.

Taking ownership is a mindset, and is very different from any kind of
recognized having ownership.
We shouldn't tag areas as owned by someone, as that could discourage
others getting involved in that area.
It might also encourage deference to the owner, which would also be
a bad thing.  We sometimes naturally defer to someone with more
experience in an area than we have, but it should continue to be on an
informal case-by-case basis.

 It could be
 useful to people not in the know on who to contact

The right contact point is this mailing list.
There's already way to much off-list (and off IRC channel)
collaboration that goes on IMO.

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506492#comment-13506492
 ] 

Yonik Seeley commented on LUCENE-4575:
--

bq. I currently copy the commitData map on setCommitData. It seems safe to do 
it, and I don't think commitData are huge. Any objections?

Do any users care about order (i.e. they pass in a LinkedHashMap)?  If would be 
trivial to preserve *if* it added value for some.

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4050) Solr example fails to start in nightly-smoke

2012-11-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506495#comment-13506495
 ] 

Tomás Fernández Löbbe commented on SOLR-4050:
-

I'm having this exact issue after upgrading (trunk). is there something I 
should clean/rebuild/delete in order to get this to work?

 Solr example fails to start in nightly-smoke
 

 Key: SOLR-4050
 URL: https://issues.apache.org/jira/browse/SOLR-4050
 Project: Solr
  Issue Type: Bug
Reporter: Michael McCandless
Priority: Blocker

 The nightly smoke job is stalled (I'll go kill it shortly): 
 https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/22/console
 It's stalled when trying to run the Solr example ... the server produced this 
 output:
 {noformat}
 java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at org.eclipse.jetty.start.Main.invokeMain(Main.java:424)
   at org.eclipse.jetty.start.Main.start(Main.java:602)
   at org.eclipse.jetty.start.Main.main(Main.java:82)
 ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration
 Usage: java -jar start.jar [options] [properties] [configs]
java -jar start.jar --help  # for more information
 {noformat}
 Seems likely the Jetty upgrade somehow caused this...
 Separately I committed a fix to smoke tester so that it quickly fails if the 
 Solr example fails to start ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4050) Solr example fails to start in nightly-smoke

2012-11-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506504#comment-13506504
 ] 

Yonik Seeley commented on SOLR-4050:


Tomas: try removing start.jar and let ivy re-get it.

 Solr example fails to start in nightly-smoke
 

 Key: SOLR-4050
 URL: https://issues.apache.org/jira/browse/SOLR-4050
 Project: Solr
  Issue Type: Bug
Reporter: Michael McCandless
Priority: Blocker

 The nightly smoke job is stalled (I'll go kill it shortly): 
 https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/22/console
 It's stalled when trying to run the Solr example ... the server produced this 
 output:
 {noformat}
 java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at org.eclipse.jetty.start.Main.invokeMain(Main.java:424)
   at org.eclipse.jetty.start.Main.start(Main.java:602)
   at org.eclipse.jetty.start.Main.main(Main.java:82)
 ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration
 Usage: java -jar start.jar [options] [properties] [configs]
java -jar start.jar --help  # for more information
 {noformat}
 Seems likely the Jetty upgrade somehow caused this...
 Separately I committed a fix to smoke tester so that it quickly fails if the 
 Solr example fails to start ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-29 Thread David Smiley (@MITRE.org)
Those are good points Yonik.  I guess I don't know what to think anymore.


Yonik Seeley-4 wrote
 On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org)
 lt;

 DSMILEY@

 gt; wrote:
 Maybe we should have a
 roster somewhere of parts of the codebase that have an owner.
 
 Taking ownership is a mindset, and is very different from any kind of
 recognized having ownership.
 We shouldn't tag areas as owned by someone, as that could discourage
 others getting involved in that area.
 It might also encourage deference to the owner, which would also be
 a bad thing.  We sometimes naturally defer to someone with more
 experience in an area than we have, but it should continue to be on an
 informal case-by-case basis.
 
 It could be
 useful to people not in the know on who to contact
 
 The right contact point is this mailing list.
 There's already way to much off-list (and off IRC channel)
 collaboration that goes on IMO.
 
 -Yonik
 http://lucidworks.com
 
 -
 To unsubscribe, e-mail: 

 dev-unsubscribe@.apache

 For additional commands, e-mail: 

 dev-help@.apache





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Active-4-x-branches-tp4022609p4023246.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.

2012-11-29 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4125:
-

 Summary: There are a few small changes in 5x that should be in 4x 
but are not.
 Key: SOLR-4125
 URL: https://issues.apache.org/jira/browse/SOLR-4125
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.1, 5.0


Someone pinged me today about a very small part of a fix that is in 5x but not 
4x. I've done a bit of comparing a found a couple such things. I'll merge them 
back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.

2012-11-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506509#comment-13506509
 ] 

Uwe Schindler commented on SOLR-4125:
-

What is this issue about...?

 There are a few small changes in 5x that should be in 4x but are not.
 -

 Key: SOLR-4125
 URL: https://issues.apache.org/jira/browse/SOLR-4125
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Someone pinged me today about a very small part of a fix that is in 5x but 
 not 4x. I've done a bit of comparing a found a couple such things. I'll merge 
 them back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.

2012-11-29 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506510#comment-13506510
 ] 

Commit Tag Bot commented on SOLR-4125:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1415191

SOLR-4125: There are a few small changes in 5x that should be in 4x but are not.



 There are a few small changes in 5x that should be in 4x but are not.
 -

 Key: SOLR-4125
 URL: https://issues.apache.org/jira/browse/SOLR-4125
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Someone pinged me today about a very small part of a fix that is in 5x but 
 not 4x. I've done a bit of comparing a found a couple such things. I'll merge 
 them back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-29 Thread Jack Krupansky
Hey, this is open source, which means that everything is fair game for 
everybody. Anybody (even a non-committer) can be an owner simply by being 
active in the conversations in any area. So, who is the owner of any area 
will vary over the course of a year. Sometimes people take breaks (or have 
real work assignments), so their ownership may fade down and later fade 
up again.


Although the email list is one of the primary mediums for conversations, 
Jiras and comments in the Jiras, as well as svn commit history, will make it 
clear to any newcomer who are the owners (and that should/must be plural!) 
or most-interested parties in a particular area.


If at any point it looks like there is a single owner in an area, that is 
a sign of potential trouble. Keep the bus factor in mind.


-- Jack Krupansky

-Original Message- 
From: David Smiley (@MITRE.org)

Sent: Thursday, November 29, 2012 9:48 AM
To: dev@lucene.apache.org
Subject: Re: Active 4.x branches?

Those are good points Yonik.  I guess I don't know what to think anymore.


Yonik Seeley-4 wrote

On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org)
lt;



DSMILEY@



gt; wrote:

Maybe we should have a
roster somewhere of parts of the codebase that have an owner.


Taking ownership is a mindset, and is very different from any kind of
recognized having ownership.
We shouldn't tag areas as owned by someone, as that could discourage
others getting involved in that area.
It might also encourage deference to the owner, which would also be
a bad thing.  We sometimes naturally defer to someone with more
experience in an area than we have, but it should continue to be on an
informal case-by-case basis.


It could be
useful to people not in the know on who to contact


The right contact point is this mailing list.
There's already way to much off-list (and off IRC channel)
collaboration that goes on IMO.

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail:



dev-unsubscribe@.apache



For additional commands, e-mail:



dev-help@.apache






-
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Active-4-x-branches-tp4022609p4023246.html

Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4050) Solr example fails to start in nightly-smoke

2012-11-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506511#comment-13506511
 ] 

Tomás Fernández Löbbe commented on SOLR-4050:
-

That did the trick, thanks.

 Solr example fails to start in nightly-smoke
 

 Key: SOLR-4050
 URL: https://issues.apache.org/jira/browse/SOLR-4050
 Project: Solr
  Issue Type: Bug
Reporter: Michael McCandless
Priority: Blocker

 The nightly smoke job is stalled (I'll go kill it shortly): 
 https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/22/console
 It's stalled when trying to run the Solr example ... the server produced this 
 output:
 {noformat}
 java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at org.eclipse.jetty.start.Main.invokeMain(Main.java:424)
   at org.eclipse.jetty.start.Main.start(Main.java:602)
   at org.eclipse.jetty.start.Main.main(Main.java:82)
 ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration
 Usage: java -jar start.jar [options] [properties] [configs]
java -jar start.jar --help  # for more information
 {noformat}
 Seems likely the Jetty upgrade somehow caused this...
 Separately I committed a fix to smoke tester so that it quickly fails if the 
 Solr example fails to start ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.

2012-11-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506512#comment-13506512
 ] 

Uwe Schindler commented on SOLR-4125:
-

The commit explained - sorry for the noise :-)

 There are a few small changes in 5x that should be in 4x but are not.
 -

 Key: SOLR-4125
 URL: https://issues.apache.org/jira/browse/SOLR-4125
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Someone pinged me today about a very small part of a fix that is in 5x but 
 not 4x. I've done a bit of comparing a found a couple such things. I'll merge 
 them back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.

2012-11-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506514#comment-13506514
 ] 

Mark Miller commented on SOLR-4125:
---

Yeah, just a sync up issue - some small part of a fix missed being merged back 
in some of my work - now I'm on the hunt for anything else I may have missed!

 There are a few small changes in 5x that should be in 4x but are not.
 -

 Key: SOLR-4125
 URL: https://issues.apache.org/jira/browse/SOLR-4125
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.1, 5.0


 Someone pinged me today about a very small part of a fix that is in 5x but 
 not 4x. I've done a bit of comparing a found a couple such things. I'll merge 
 them back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-29 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506520#comment-13506520
 ] 

Jack Krupansky commented on SOLR-139:
-

Oh, yeah, that. I actually was going to mention it, but I wanted to focus on 
running with the stock Solr example first. Actually, we need to look a little 
closer as to why/whether the updateLog directive is really always needed for 
partial document update. That should probably be a separate Jira issue.


 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Active 4.x branches?

2012-11-29 Thread Dyer, James
Whenever I want to know who owns a piece of code, I just look at the svn 
history to see who has been modifying it.  

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] 
Sent: Thursday, November 29, 2012 8:49 AM
To: dev@lucene.apache.org
Subject: Re: Active 4.x branches?

Those are good points Yonik.  I guess I don't know what to think anymore.


Yonik Seeley-4 wrote
 On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org)
 lt;

 DSMILEY@

 gt; wrote:
 Maybe we should have a
 roster somewhere of parts of the codebase that have an owner.
 
 Taking ownership is a mindset, and is very different from any kind of
 recognized having ownership.
 We shouldn't tag areas as owned by someone, as that could discourage
 others getting involved in that area.
 It might also encourage deference to the owner, which would also be
 a bad thing.  We sometimes naturally defer to someone with more
 experience in an area than we have, but it should continue to be on an
 informal case-by-case basis.
 
 It could be
 useful to people not in the know on who to contact
 
 The right contact point is this mailing list.
 There's already way to much off-list (and off IRC channel)
 collaboration that goes on IMO.
 
 -Yonik
 http://lucidworks.com
 
 -
 To unsubscribe, e-mail: 

 dev-unsubscribe@.apache

 For additional commands, e-mail: 

 dev-help@.apache





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Active-4-x-branches-tp4022609p4023246.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506526#comment-13506526
 ] 

Shai Erera commented on LUCENE-4575:


We use commitData extensively but we don't care about the order. We store 
key/value pairs.

I don't think though that it's trivial to support. Currently the user can pass 
any Map, but IndexReader returns in practice a HashMap 
(DataInput.readStringStringMap initializes a HashMap). Therefore, if we want to 
preserve the type of the Map, we'd need to change DataInput/Output code. I'm 
not sure it's worth the hassle, but let's discuss that anyway on a separate 
issue? It's not really related to how the map is set.

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506528#comment-13506528
 ] 

Uwe Schindler commented on LUCENE-4575:
---

The API returns MapString,String, so we make no garanties about order.

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



AW: AW: Pylucene release

2012-11-29 Thread Thomas Koch
Hi Andi, 
thanks for you instructions - I meanwhile managed to install pylucene (4.0)
from trunk and started working on the test_fuzzyQuery. Will send you a patch
once I managed to update a few tests. Just wanted to let you know about
(slow) progress - sorry for late reply!

regards,
Thomas 

 -Ursprüngliche Nachricht-
 Von: Andi Vajda [mailto:va...@apache.org]
 Gesendet: Mittwoch, 14. November 2012 18:36
 An: pylucene-...@lucene.apache.org
 Betreff: Re: AW: Pylucene release
 
 
   Hi Thomas,
 
 On Wed, 14 Nov 2012, Thomas Koch wrote:
 
  I still wanted to check the API changes related to 4.0 and could then
  help with porting the example code (and/or unit tests). I hope there
  are more people interested in helping to port PyLucene (or at least the
 'related'
  Python code) to the Lucene 4.0 level...
 
  How can we best proceed?
 
   1. Pick a test that fails (for example: python test/test_FuzzyQuery.py)
   2. Announce you're working on it on the list (so that only you does)
   3. Fix it
   4. Send in a patch
 
  I assume you checked in the code that's adapted already to SVN.
 
 Yes, all current code is checked in, including fixed or broken tests.
 
  Is there a list of code that needs to be ported (and can be used to
  distribute tasks)?
 
 Currently, all tests in test up to test_FilteredQuery.py (alphabetically)
pass.
 The test_ICU* ones also pass. You should use these as examples on how to
 fix failing ones.
 
  As said I don't have a an idea of the API changes yet, so it's hard to
  estimate the time needed to get used to 4.0
 
 No time estimated is expected from you.
 It's best to proceed by example. Look at the tests that pass already (and
thus
 that have been fixed) as examples.
 The steps to fix a failing test are as follows:
- fix import statements first (they're all changed since PyLucene 4.0
  no longer uses a flat namespace but strictly follows the original
Java
  package structure now)
  for example:
   from lucene import Document
  becomes
   from org.apache.lucene.document import Document
  If you don't know where a class is (and the Lucene tree is deeply
  nested), find lucene src -name ClassName.java will usually give
  you an idea of the package structure to import
- when it makes sense (most of the time), use PyLuceneTestCase as the
  parent test class. This will help with the complexities/boilerplate
in
  creating a test IndexWriter/Reader/Searcher using a RAMDirectory
- if the tests still fails, look at the original Java test code for
  possible changes in the API or the expected that behaviour that
occurred
  since the first port. The original Java test file is usually named
  TestName.java when the Python test is named test_Name.py
 
 Andi..
 
  (and fix the code), but as you did that already maybe you can share
  your experience with us. As with any new major release (e.g. Python
  3.x) I guess many of us are afraid to move forward to the new release
  and change our code base, but certainly that's just a matter of time ...
 
  Cheers,
  Thomas
 
  -Ursprüngliche Nachricht-
  Von: Andi Vajda [mailto:va...@apache.org]
  Gesendet: Dienstag, 13. November 2012 23:18
  An: Shawn Grant
  Cc: pylucene-...@lucene.apache.org
  Betreff: Re: Pylucene release
 
 
Hi Shawn,
 
  On Tue, 13 Nov 2012, Shawn Grant wrote:
 
  Hi Andi, I was just wondering if Pylucene is on its usual schedule
  to release
  4-6 weeks after Lucene.  I didn't see any discussion of it on the
  mailing list or elsewhere.  I'm looking forward to 4.0!
 
  Normally, PyLucene is released a few days after a Lucene release but
  4.0
  has
  seen so many API changes and removals that all tests and samples need
  to be ported to the new API. Last week-end, I ported a few but lots
  remain to be.
 
  If no one helps, it either means that no one cares enough or that
  everyone
  is
  willing to be patient :-)
 
  The PyLucene trunk svn repository is currently tracking the Lucene
  Core
  4.x
  branch and you're welcome to use it out of svn. In the ten or so unit
  tests I
  ported so far, I didn't find any issues with PyLucene proper (or
  JCC). All changes were due to the tests being out of date or using
  deprecated APIs now removed. You might find that PyLucene out-of-
 trunk is quite usable.
 
  If people want to help with porting PyLucene unit tests, the ones
  under
  its
  'test' directory not yet ported, feel free to ask questions here.
  The gist of it is:
 - fix the imports (look at the first few tests for example,
   alphabetically)
 - fix the tests to pass by looking at the original Java tests for
  changes
   as most of these tests were originally ported from Java Lucene.
 
  Once you're familiar with the new APIs, porting the sample code in
  samples and in LuceneInAction should fairly straightforward. It's
  just that there
  is a lot
  to port.
 
  Andi..
 
 
 




[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506530#comment-13506530
 ] 

Yonik Seeley commented on LUCENE-4575:
--

bq. I don't think though that it's trivial to support. Currently the user can 
pass any Map, but IndexReader returns in practice a HashMap 
(DataInput.readStringStringMap initializes a HashMap). 

If a user cared about order, then they would pass a LinkedHashMap.  Then the 
only thing that would need to change is DataInput.readStringStringMap: 
s/HashMap/LinkedHashMap.

bq. it's not really related to how the map is set.

It is... if you make a copy of the map and we want to preserve order, it's new 
LinkedHashMap instead of HashMap.

It's a minor enough point I don't think it does deserve it's own issue.  I 
don't personally care about preserving order - but I did think it was worth at 
least bringing up.

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506533#comment-13506533
 ] 

Mark Miller commented on SOLR-139:
--

bq.  we need to look a little closer as to why/whether the updateLog 
directive is really always needed for partial document update.

I believe yonik chose to implement it by using updateLog features.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData

2012-11-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506540#comment-13506540
 ] 

Shai Erera commented on LUCENE-4575:


bq. Then the only thing that would need to change is 
DataInput.readStringStringMap: s/HashMap/LinkedHashMap.

So you propose that the code will always initialize LHM in DataInput, that way 
preserving order whether required or not? Yes, I guess that we can do that. But 
I wonder if we should? We didn't so far, and nobody complained. And since it's 
an internal change, we can always make that change in the future if somebody 
asks?

 Allow IndexWriter to commit, even just commitData
 -

 Key: LUCENE-4575
 URL: https://issues.apache.org/jira/browse/LUCENE-4575
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-4575.patch


 Spinoff from here 
 http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html.
 In some cases, it is valuable to be able to commit changes to the index, even 
 if the changes are just commitData. Such data is sometimes used by 
 applications to register in the index some global application 
 information/state.
 The proposal is:
 * Add a setCommitData() API and separate it from commit() and prepareCommit() 
 (simplify their API)
 * When that API is called, flip on the dirty/changes bit, so that this gets 
 committed even if no other changes were made to the index.
 I will work on a patch a post.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3668) offsets issues with multiword synonyms

2012-11-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506554#comment-13506554
 ] 

Robert Muir commented on LUCENE-3668:
-

That writeup is a little off.

{quote}
Finally, and most seriously, the SynonymFilterFactory will simply not match 
multi-word synonyms in user queries if you do any kind of tokenization. This is 
because the tokenizer breaks up the input before the SynonymFilterFactory can 
transform it.
{quote}

Thats not correct. The bug is in QueryParser: LUCENE-2605.


 offsets issues with multiword synonyms
 --

 Key: LUCENE-3668
 URL: https://issues.apache.org/jira/browse/LUCENE-3668
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 3.6, 4.0-ALPHA

 Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch


 as reported on the list, there are some strange offsets with FSTSynonyms, in 
 the case of multiword synonyms.
 as a workaround it was suggested to use the older synonym impl, but it has 
 bugs too (just in a different way).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4120) Collection API: Support for specifying a list of solrs to spread a new collection across

2012-11-29 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506444#comment-13506444
 ] 

Per Steffensen edited comment on SOLR-4120 at 11/29/12 4:16 PM:


h4. SOLR-4120.patch

h5. Where does it fit

* It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch 
for SOLR-4114 has already been applied. The following should work if you have a 
checkout of revision 1412602 of branch lucene_solr_4_0
** cd checkout-folder
** patch -s -p0  SOLR-4114.patch
** patch --ignore-whitespace -p0  SOLR-4120.patch

You need the --ignore-whitespace - at least with my version of patch on Show 
Leopard. Probably because I do not have the correct Solr code-style installed 
in my Eclipse. Hmmm, probably should do that.

h5. Content of the patch

The patch modifies the create operation of the Solr Collection API, so that it 
allows to provide a list of Solrs that the shards for the new collection should 
be spread across
* Param key: createNodeSet (OverseerCollectionProcessor.CREATE_NODE_SET)
* Param value: comma-separated list of node-names (equal to the node-names 
received from ClusterState.getLiveNodes())
* Param is not mandatory. If not provided the created collection will still 
have its shards spread across all live nodes

h5. Testing 

BasicDistributedZkTest.testCollectionAPI has been modified to also test this 
feature

  was (Author: steff1193):
h4. SOLR-4120.patch

h5. Where does it fit

* It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch 
for SOLR-4114 has already been applied. The following should work if you have a 
checkout of revision 1412602 of branch lucene_solr_4_0
** cd checkout-folder
** patch -s -p0  SOLR-4114.patch
** patch --ignore-whitespace -p0  SOLR-4120.patch

You need the --ignore-whitespace - at least with my version of patch on Show 
Leopard. Probably because I do not have the correct Solr code-style installed 
in my Eclipse. Hmmm, probably should do that.

h5. Content of the patch

The patch modifies the create operation of the Solr Collection API, so that i 
allows to provide a list of Solrs that the shards for the new collection should 
be spread across
* Param key: createNodeSet (OverseerCollectionProcessor.CREATE_NODE_SET)
* Param value: comma-separated list of node-names (equal to the node-names 
received from ClusterState.getLiveNodes())
* Param is not mandatory. If not provided the created collection will still 
have its shards spread across all live nodes

h5. Testing 

BasicDistributedZkTest.testCollectionAPI has been modified to also test this 
feature
  
 Collection API: Support for specifying a list of solrs to spread a new 
 collection across
 

 Key: SOLR-4120
 URL: https://issues.apache.org/jira/browse/SOLR-4120
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
Reporter: Per Steffensen
Assignee: Per Steffensen
Priority: Minor
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4120.patch


 When creating a new collection through the Collection API, the Overseer 
 (handling the creation) will spread shards for this new collection across all 
 live nodes.
 Sometimes you dont want a collection spread across all available nodes. Allow 
 for the create operation of the Collection API, to take a createNodeSet 
 parameter containing a list of Solr to spread the new shards across. If not 
 provided it will just spread across all available nodes (default).
 For an example of a concrete case of usage see: 
 https://issues.apache.org/jira/browse/SOLR-4114?focusedCommentId=13505506page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505506

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4126) Partial Update retrieve int/float value error

2012-11-29 Thread nihed mbarek (JIRA)
nihed mbarek created SOLR-4126:
--

 Summary: Partial Update retrieve int/float value error
 Key: SOLR-4126
 URL: https://issues.apache.org/jira/browse/SOLR-4126
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Solr 4.0 
Reporter: nihed mbarek


Dear, 

I have a document that I update using the recommendation of this link 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

as XML/JSON, the result is ok 
int name=a109/int
float name=b4.368/float
int name=c5318311/int

but in my request handler : 

final Document doc = req.getSearcher().doc(x);
final ListIndexableField fields = doc.getFields();
for (IndexableField indexableField : fields) {
System.out.println(indexableField.name()+ 
+indexableField.stringValue());
}

the result is totally out of range : 
a €m
b Àࢼڨ
c €Ԓڧ


the kind of result is only visible for field with type different than string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4566) SearcherManager.afterRefresh() issues

2012-11-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4566:
--

Assignee: Michael McCandless

 SearcherManager.afterRefresh() issues
 -

 Key: LUCENE-4566
 URL: https://issues.apache.org/jira/browse/LUCENE-4566
 Project: Lucene - Core
  Issue Type: Bug
Reporter: selckin
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-4566-double-listeners.patch, LUCENE-4566.patch, 
 LUCENE-4566.patch


 1) ReferenceManager.doMaybeRefresh seems to call afterRefresh even if it 
 didn't refresh/swap, (when newReference == null)
 2) It would be nice if users were allowed to override 
 SearcherManager.afterRefresh() to get notified when a new searcher is in 
 action.
 But SearcherManager and ReaderManager are final, while NRTManager is not.
 The only way to currently hook into when a new searched is created is using 
 the factory, but if you wish to do some async task then, there are no 
 guarantees that acquire() will return the new searcher, so you have to pass 
 it around and incRef manually. While if allowed to hook into afterRefresh you 
 can just rely on acquire()   existing infra you have around it to give you 
 the latest one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4566) SearcherManager.afterRefresh() issues

2012-11-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4566:
---

Attachment: LUCENE-4566.patch

Patch, removing the closed listener (I think we don't need it?) ... I think 
it's ready.

 SearcherManager.afterRefresh() issues
 -

 Key: LUCENE-4566
 URL: https://issues.apache.org/jira/browse/LUCENE-4566
 Project: Lucene - Core
  Issue Type: Bug
Reporter: selckin
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-4566-double-listeners.patch, LUCENE-4566.patch, 
 LUCENE-4566.patch


 1) ReferenceManager.doMaybeRefresh seems to call afterRefresh even if it 
 didn't refresh/swap, (when newReference == null)
 2) It would be nice if users were allowed to override 
 SearcherManager.afterRefresh() to get notified when a new searcher is in 
 action.
 But SearcherManager and ReaderManager are final, while NRTManager is not.
 The only way to currently hook into when a new searched is created is using 
 the factory, but if you wish to do some async task then, there are no 
 guarantees that acquire() will return the new searcher, so you have to pass 
 it around and incRef manually. While if allowed to hook into afterRefresh you 
 can just rely on acquire()   existing infra you have around it to give you 
 the latest one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-11-29 Thread Tom Burton-West (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Burton-West updated LUCENE-4286:


Attachment: LUCENE-4286.patch_3.x

We are still using Solr 3.6 in production so I backported the patch to 
Lucene/Solr 3.6.  Attached as LUCENE-4286.patch_3.x

 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
 -

 Key: LUCENE-4286
 URL: https://issues.apache.org/jira/browse/LUCENE-4286
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 3.6.1
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 4.0-BETA, 5.0

 Attachments: LUCENE-4286.patch, LUCENE-4286.patch, 
 LUCENE-4286.patch_3.x


 Add an optional  flag to the CJKBigramFilter to tell it to also output 
 unigrams.   This would allow indexing of both bigrams and unigrams and at 
 query time the analyzer could analyze queries as bigrams unless the query 
 contained a single Han unigram.
 As an example here is a configuration a Solr fieldType with the analyzer for 
 indexing with the indexUnigrams flag set and the analyzer for querying 
 without the flag. 
 fieldType name=CJK autoGeneratePhraseQueries=false
 −
 analyzer type=index
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory indexUnigrams=true 
 han=true/
 /analyzer
 analyzer type=query
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory han=true/
 /analyzer
 /fieldType
 Use case: About 10% of our queries that contain Han characters are single 
 character queries.   The CJKBigram filter only outputs single characters when 
 there are no adjacent bigrammable characters in the input.  This means we 
 have to create a separate field to index Han unigrams in order to address 
 single character queries and then write application code to search that 
 separate field if we detect a single character Han query.  This is rather 
 kludgey.  With the optional flag, we could configure Solr as above  
 This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter 
 used to allow single word queries (although that uses word n-grams rather 
 than character n-grams.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4126) Partial Update retrieve int/float value error

2012-11-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-4126.


Resolution: Not A Problem

Nihed,

solr plugins need to use the IndexSchema to access Documents in order to 
convert the encoded values in those documents into the appropriate javatypes.

see for example SolrPluginUtils.docListToSolrDocumentList or 
TextResponseWriter.toSolrDocument.

If you have more questions about writing custom plugins, please ask about them 
on the solr-user list.

 Partial Update retrieve int/float value error
 -

 Key: SOLR-4126
 URL: https://issues.apache.org/jira/browse/SOLR-4126
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Solr 4.0 
Reporter: nihed mbarek

 Dear, 
 I have a document that I update using the recommendation of this link 
 http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
 as XML/JSON, the result is ok 
 int name=a109/int
 float name=b4.368/float
 int name=c5318311/int
 but in my request handler : 
 final Document doc = req.getSearcher().doc(x);
 final ListIndexableField fields = doc.getFields();
 for (IndexableField indexableField : fields) {
 System.out.println(indexableField.name()+ 
 +indexableField.stringValue());
 }
 the result is totally out of range : 
 a €m
 b Àࢼڨ
 c €Ԓڧ
 the kind of result is only visible for field with type different than string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



pro coding style

2012-11-29 Thread Radim Kolar
if you talk about my yesterday work then no reformats were done because 
code was already properly formatted. Also all code was hand written, no 
generated code was used. Generated code is not committed to git anyway.


my hard limits for code quality (checked at commit):
* no findbugs warnings with level 14+
* code coverage 80%
* code coverage in critical parts 95%
* list of PMD warnings to stop commit
* generation of call tree graph - check it for cycles, checking for 
calling same procedure from different levels (indicates bad code flow)

* all eclipse warnings turned into errors
* patched eclipse compiler to do better flow analysis
* code reformatted at commit
* javadoc everything, no warnings

what you should do:
* stuff i do
   +
* ant - maven
* svn - git (way better tools)
* split code into small manageable maven modules
* get more people
* put trust into your testing, not into perfect people
* work faster
* use github to track patches
* use springs for integration testing
* use jenkins to do tests on incoming patches
* do library checks for number of functions really used
* contributor patches should be high priority or you will lose contributors

i am giving sometimes lessons: about 1-2 sessions per year for 14 
people, if i have spare time. But its waste of time, most ppl will not 
follow.


learn this:
SLOW CODING != BUG FREE CODE.
GOOD TESTS + GOOD STATIC TESTING = GOOD BUG FREE CODE
CODE STYLE != GAME WITH SPACES AND { }
GOOD TESTS =  2x TIME NEEDED TO CODE STUFF UNDER TEST
GOOD TESTS ARE MORE VALUABLE THEN GOOD CODE

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2701) Expose IndexWriter.commit(MapString,String commitUserData) to solr

2012-11-29 Thread Greg Bowyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506688#comment-13506688
 ] 

Greg Bowyer commented on SOLR-2701:
---

bq. I haven't had a chance to check out the rest of the patch/issue, but for 
this specifically, what about a convention? Anything under the persistent key 
in the commit data is carried over indefinitely. Or if persistent is the norm, 
then we could reverse it and have a transient map that is not carried over.

The persistent/transient map sounds like a good idea; I will take a look at how 
that can be implemented

 Expose IndexWriter.commit(MapString,String commitUserData) to solr 
 -

 Key: SOLR-2701
 URL: https://issues.apache.org/jira/browse/SOLR-2701
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0-ALPHA
Reporter: Eks Dev
Priority: Minor
  Labels: commit, update
 Attachments: SOLR-2701-Expose-userCommitData-throughout-solr.patch, 
 SOLR-2701.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 At the moment, there is no feature that enables associating user information 
 to the commit point.
  
 Lucene supports this possibility and it should be exposed to solr as well, 
 probably via beforeCommit Listener (analogous to prepareCommit in Lucene).
 Most likely home for this Map to live is UpdateHandler.
 Example use case would be an atomic tracking of sequence numbers or 
 timestamps for incremental updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506735#comment-13506735
 ] 

Hoss Man commented on SOLR-139:
---

bq. I believe yonik chose to implement it by using updateLog features.

i think it has to be - the real time get support provided by the updateLog is 
the only way to garuntee that the document will be available to atomicly update 
it.

Lukas: if the atomic update code path isn't throwing a big fat error if you try 
to use it w/o updateLog configured then that sounds to me like a bug -- can you 
please file a Jira for that

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



IndexWriter.ensureOpen and ensureOpen(boolean)

2012-11-29 Thread Shai Erera
Hi

While working on LUCENE-4575 I noticed what I thought was an inconsistency
between prepareCommit() and prepareCommit(commitData).
The former called ensureOpen(true) and the latter ensureOpen(false). At
first I thought that this is a bug, so I fixed both to call
ensureOpen(true),
especially now that I consolidate the two prepCommit() versions into one,
but then all tests failed with AlreadyClosedException. How wonderful :).

Getting deeper into the meaning of the two ensureOpen versions i realize
that the boolean means something like fail if IW has been closed, or is
in the process of closing). Some methods choose to not fail if IW is in the
process of closing, while others do (mostly internal methods).

My question is - why make the distinction? If IW is in the process of
closing, why not always fail?

Shai


Re: pro coding style

2012-11-29 Thread Simon Willnauer
hey,

some comments inline...

On Thu, Nov 29, 2012 at 7:48 PM, Radim Kolar h...@filez.com wrote:
 if you talk about my yesterday work then no reformats were done because code
 was already properly formatted. Also all code was hand written, no generated
 code was used. Generated code is not committed to git anyway.

 my hard limits for code quality (checked at commit):
 * no findbugs warnings with level 14+
 * code coverage 80%
 * code coverage in critical parts 95%
 * list of PMD warnings to stop commit
 * generation of call tree graph - check it for cycles, checking for calling
 same procedure from different levels (indicates bad code flow)
 * all eclipse warnings turned into errors
 * patched eclipse compiler to do better flow analysis
 * code reformatted at commit
 * javadoc everything, no warnings

 what you should do:
 * stuff i do
+
 * ant - maven

I suggest you start with this, make sure you have enough time and
energy for the discussion.

 * svn - git (way better tools)

I think we had this discussion already and it seems that lots of folks
are positive, yet there is still some barrier infrasturcuture wise
along the lines.
 * split code into small manageable maven modules
see above - we have a fully functional maven build but ant is out
primary build. My honest opinion forget what I said above - don't try.
 * get more people
good point - can you refere us some, in my experience they are pretty
hard to find.

 * put trust into your testing, not into perfect people

ahh yeah testing, we should do that at some point

 * work faster

wow - I never thought about that though!
 * use github to track patches

wait why is github good for patches?

 * use springs for integration testing

sorry we are a no-dependency library.

 * use jenkins to do tests on incoming patches

patches welcome

 * do library checks for number of functions really used

hmm - we are a library?

 * contributor patches should be high priority or you will lose contributors

thats is a good advice for such a young project.

 i am giving sometimes lessons: about 1-2 sessions per year for 14 people, if
 i have spare time. But its waste of time, most ppl will not follow.

 learn this:
 SLOW CODING != BUG FREE CODE.
 GOOD TESTS + GOOD STATIC TESTING = GOOD BUG FREE CODE
 CODE STYLE != GAME WITH SPACES AND { }
 GOOD TESTS =  2x TIME NEEDED TO CODE STUFF UNDER TEST
 GOOD TESTS ARE MORE VALUABLE THEN GOOD CODE

lets drop the code its a hassle to maintain anyway!

thanks man,

this mail made my day!

simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-11-29 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506860#comment-13506860
 ] 

Commit Tag Bot commented on SOLR-3849:
--

[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1415402

SOLR-3849: Maven configuration += -Djava.awt.headless=true; also, upgrade 
maven-surefire-plugin to 2.12.4



 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Commented] (SOLR-3602) Look into updating to ZooKeeper 3.4.5

2012-11-29 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506868#comment-13506868
 ] 

Commit Tag Bot commented on SOLR-3602:
--

[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1415408

SOLR-3602: Maven configuration: Exclude new zookeeper 3.4.5 transitive 
dependency org.slf4j:slf4j-log4j12



 Look into updating to ZooKeeper 3.4.5
 -

 Key: SOLR-3602
 URL: https://issues.apache.org/jira/browse/SOLR-3602
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.1, 5.0


 Looks like 3.4.4 may be considered stable - if that happens, we should look 
 into updating.
 Otherwise, we should keep on eye out for 3.3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError

2012-11-29 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506878#comment-13506878
 ] 

Commit Tag Bot commented on SOLR-3849:
--

[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1415410

SOLR-3849: Maven configuration += -Djava.awt.headless=true; also, upgrade 
maven-surefire-plugin to 2.12.4 (merge trunk r1415402)



 ScriptEngineTest failure RE system properties and ThreadLeakError
 -

 Key: SOLR-3849
 URL: https://issues.apache.org/jira/browse/SOLR-3849
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 5.0
 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
Reporter: David Smiley
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: SOLR-3849.patch


 100% reproducible for me:
 solr$ ant test  -Dtestcase=ScriptEngineTest
 {noformat}
 [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4
 [junit4:junit4] Executing 1 suite with 1 JVM.
 [junit4:junit4] 
 [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest
 [junit4:junit4] OK  0.17s | ScriptEngineTest.testPut
 [junit4:junit4] OK  0.02s | ScriptEngineTest.testEvalReader
 [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby
 [junit4:junit4] Assumption #1: got: [null], expected: each not null
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testEvalText
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByExtension
 [junit4:junit4] OK  0.01s | ScriptEngineTest.testGetEngineByName
 [junit4:junit4]   2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
 Will linger awaiting termination of 2 leaked thread(s).
 [junit4:junit4]   2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 
 thread leaked from SUITE scope at 
 org.apache.solr.update.processor.ScriptEngineTest: 
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll 
 Starting to interrupt leaked threads:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE 
 There are still zombie threads that couldn't be terminated:
 [junit4:junit4]   2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, 
 group=main]
 [junit4:junit4]   2  at (empty stack)
 [junit4:junit4]   2 NOTE: test params are: codec=SimpleText, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, 
 timezone=America/Edmonton
 [junit4:junit4]   2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 
 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728
 [junit4:junit4]   2 NOTE: All tests run in this JVM: [ScriptEngineTest]
 [junit4:junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true 
 -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton 
 -Dtests.file.encoding=UTF-8
 [junit4:junit4] ERROR   0.00s | ScriptEngineTest (suite) 
 [junit4:junit4] Throwable #1: java.lang.AssertionError: System 
 properties invariant violated.
 [junit4:junit4] New keys:
 [junit4:junit4]   sun.awt.enableExtraMouseButtons=true
 [junit4:junit4]   sun.font.fontmanager=sun.font.CFontManager
 [junit4:junit4] 
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 [junit4:junit4]  at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 [junit4:junit4]  at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 [junit4:junit4]  at 
 

[jira] [Updated] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-29 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4574:
-

Attachment: LUCENE-4574.patch

I've thought about this some more and chatted with with Yonik  Adrien in IRC.

Attached is a new patch.  In a nutshell, the caching is done via 
ScoreCachingWrappingScorer and is applied by TopFieldCollector but only when 
one of the comparators is a RelevancyComparator.  I believe this is the only 
case when the score could be retrieved more than once per document.

To implement this patch, I did a little refactoring.  I pulled a Scorer field 
that was common to all subclasses of TopFieldCollector into TFC, and I added a 
getFieldComparators() abstract method that is implemented trivially by all its 
subclasses.  setScorer() is now implemented only at TFC and none of its 
subclasses.

If this seems reasonable, perhaps it would be good to make a further 
refactoring such that FieldComparator.setScorer() doesn't exist; leave it 
specific to RelevanceComparator or introduce an abstract class 
FieldComparatorNeedsScorer.  After all, in Lucene only RelevanceComparator 
needs it.

 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, LUCENE-4574.patch, 
 Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3602) Look into updating to ZooKeeper 3.4.5

2012-11-29 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506918#comment-13506918
 ] 

Commit Tag Bot commented on SOLR-3602:
--

[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1415411

SOLR-3602: Maven configuration: Exclude new zookeeper 3.4.5 transitive 
dependency org.slf4j:slf4j-log4j12 (merge trunk r1415408)



 Look into updating to ZooKeeper 3.4.5
 -

 Key: SOLR-3602
 URL: https://issues.apache.org/jira/browse/SOLR-3602
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.1, 5.0


 Looks like 3.4.4 may be considered stable - if that happens, we should look 
 into updating.
 Otherwise, we should keep on eye out for 3.3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4127) Atomic updates used w/o updateLog should throw an error

2012-11-29 Thread Lukas Graf (JIRA)
Lukas Graf created SOLR-4127:


 Summary: Atomic updates used w/o updateLog should throw an error
 Key: SOLR-4127
 URL: https://issues.apache.org/jira/browse/SOLR-4127
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
Reporter: Lukas Graf


The atomic update feature described in 
[SOLR-139|https://issues.apache.org/jira/browse/SOLR-139] seems to depend on 
having an {{updateLog /}} configured in {{solrconfig.xml}}.

When used without an update log, the update commands like {{set}} or {{add}} 
don't result in an error and the transaction being aborted, but produce garbled 
documents instead. This is the case for both the XML and JSON formats for the 
update message.

Example:

I initially created some content like this:

{code}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '
[{id:7cb8a43c,Title:My original Title, Creator: John Doe}]'
{code}

Which resulted in this document:

{code:xml}
doc
str name=id7cb8a43c/str
str name=TitleMy original Title/str
str name=CreatorJohn Doe/str
/doc
{code}

Then I attempted to update that document with this statement:

{code}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '
[{id:7cb8a43c,Title:{set:My new title}}]'
{code}

Which resulted in this garbled document, with the fields that weren't updated 
missing:

{code:xml}
doc
str name=id7cb8a43c/str
str name=Title{set=My new title}/str
/doc
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable

2012-11-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506938#comment-13506938
 ] 

Rafał Kuć commented on LUCENE-4542:
---

Chris anything else should be done here in your opinion or is it ready to be 
committed ?

 Make RECURSION_CAP in HunspellStemmer configurable
 --

 Key: LUCENE-4542
 URL: https://issues.apache.org/jira/browse/LUCENE-4542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Piotr
Assignee: Chris Male
 Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, 
 LUCENE-4542-with-solr.patch


 Currently there is 
 private static final int RECURSION_CAP = 2;
 in the code of the class HunspellStemmer. It makes using hunspell with 
 several dictionaries almost unusable, due to bad performance (f.ex. it costs 
 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for 
 recursion_cap=1). It would be nice to be able to tune this number as needed.
 AFAIK this number (2) was chosen arbitrary.
 (it's a first issue in my life, so please forgive me any mistakes done).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-29 Thread Lukas Graf (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506942#comment-13506942
 ] 

Lukas Graf commented on SOLR-139:
-

Filed [SOLR-4127|https://issues.apache.org/jira/browse/SOLR-4127]: Atomic 
updates used w/o updateLog should throw an error

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-29 Thread Radim Kolar



what you should do:
* stuff i do
+
* ant - maven

I suggest you start with this, make sure you have enough time and
energy for the discussion.
I dont have either, if i decide to go with SOLR instead of EC, i will 
fork it. It will save me lot of time.





* svn - git (way better tools)

I think we had this discussion already and it seems that lots of folks
are positive, yet there is still some barrier infrasturcuture wise
along the lines.

dont blame infrastructure, other apache projects are using it.


* split code into small manageable maven modules

see above - we have a fully functional maven build but ant is out
primary build.

i dont see pom.xml in your source tree.

good point - can you refere us some, in my experience they are pretty
hard to find.
i do not know people who believe that process designed to be slow is a 
good process. We here believe that fast process = high salary.



* use github to track patches
wait why is github good for patches?
you can track patch revisions and apply/browse/comment it easily. Also 
its way easier to upload it and do pull request then attach to ticket in 
jira.



* use springs for integration testing
sorry we are a no-dependency library.

scopetest/scope

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-29 Thread Radim Kolar

 How can you expect stability out of that?

unit + integration testing. If it passes tests, its not different from 
old code.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507045#comment-13507045
 ] 

Robert Muir commented on LUCENE-4574:
-

Just to bold what I said before, as I feel its important here:

{quote}
Finally, we could also consider something like your patch, except more honed in 
these particular silly situations. so thats something like,
up-front setting a boolean in these collectors ctors if one of the comparators 
is relevance *and also* its asked to track scores/max scores. 
{quote}

Seems like we are doing it always if there is a relevance comparator? I feel 
like the caching (which i hate) should be contained exactly to whats minimal 
and necessary to prevent score from being called twice.


 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, LUCENE-4574.patch, 
 Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization

2012-11-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-4123:
--

Attachment: SOLR-4123.patch

patch with that above syntax (which i'm not sure I even like).

may not work: haven't tested at all.

 ICUTokenizerFactory - per-script RBBI customization
 ---

 Key: SOLR-4123
 URL: https://issues.apache.org/jira/browse/SOLR-4123
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Shawn Heisey
 Fix For: 4.1, 5.0

 Attachments: SOLR-4123.patch


 Initially this started out as an idea for a configuration knob on 
 ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
 Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
 long discussion about it that I don't really understand, so I'll be including 
 it in the comments.
 I am a Solr user, so I would also need the ability to access the 
 configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4128) multivalued dynamicField matching 'score' causes text response writers to output score as an array

2012-11-29 Thread Aaron Daubman (JIRA)
Aaron Daubman created SOLR-4128:
---

 Summary: multivalued dynamicField matching 'score' causes text 
response writers to output score as an array
 Key: SOLR-4128
 URL: https://issues.apache.org/jira/browse/SOLR-4128
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 4.0
 Environment: all
Reporter: Aaron Daubman
Priority: Minor


With a schema that includes a dynamic field that matches 'score' (e.g. s* or 
even just *) text response writers (json, python, etc...) will return score as 
an array, e.g.:
score: [
17.522964
]

For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, 
non-multivalued 'score' field to schema.xml, e.g.:
field name=score type=string indexed=false stored=false 
multiValued=false/

Note that this will happen for anybody following the older default schema.xml 
where * was used to ignore undesired fields (e.g. as mentioned in 
https://issues.apache.org/jira/browse/SOLR-217?focusedCommentId=12492357page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12492357
 )

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4128) multivalued dynamicField matching 'score' causes text response writers to output score as an array

2012-11-29 Thread Aaron Daubman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Daubman updated SOLR-4128:


Description: 
With a schema that includes a dynamic field that matches 'score' (e.g. s* or 
even just *) text response writers (json, python, etc...) will return score as 
an array, e.g.:
score: [
17.522964
]

For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, 
non-multivalued 'score' field to schema.xml, e.g.:
field name=score type=string indexed=false stored=false 
multiValued=false/

Note that this will happen for anybody following the current (or older) example 
schema.xml where * was used to ignore undesired fields (from: SOLR-217):
https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/schema.xml#L214

  was:
With a schema that includes a dynamic field that matches 'score' (e.g. s* or 
even just *) text response writers (json, python, etc...) will return score as 
an array, e.g.:
score: [
17.522964
]

For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, 
non-multivalued 'score' field to schema.xml, e.g.:
field name=score type=string indexed=false stored=false 
multiValued=false/

Note that this will happen for anybody following the older default schema.xml 
where * was used to ignore undesired fields (e.g. as mentioned in 
https://issues.apache.org/jira/browse/SOLR-217?focusedCommentId=12492357page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12492357
 )


 multivalued dynamicField matching 'score' causes text response writers to 
 output score as an array
 --

 Key: SOLR-4128
 URL: https://issues.apache.org/jira/browse/SOLR-4128
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 4.0
 Environment: all
Reporter: Aaron Daubman
Priority: Minor
  Labels: array, ignore, schema, score

 With a schema that includes a dynamic field that matches 'score' (e.g. s* or 
 even just *) text response writers (json, python, etc...) will return score 
 as an array, e.g.:
 score: [
 17.522964
 ]
 For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, 
 non-multivalued 'score' field to schema.xml, e.g.:
 field name=score type=string indexed=false stored=false 
 multiValued=false/
 Note that this will happen for anybody following the current (or older) 
 example schema.xml where * was used to ignore undesired fields (from: 
 SOLR-217):
 https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/schema.xml#L214

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: composition of different queries based scores

2012-11-29 Thread sri krishna
for boosting the term for the same example is the above example is valid ?

(hello^0.5* OR hello^0.5~)


On Tue, Nov 27, 2012 at 11:22 PM, Jack Krupansky j...@basetechnology.comwrote:

   The fuzzy option will be ignored here – you cannot combine fuzzy and
 wild on the same term, although you could do an OR of the two:

 (hello* OR hello~)

 -- Jack Krupansky

  *From:* sri krishna krishnai...@gmail.com
 *Sent:* Tuesday, November 27, 2012 11:08 AM
 *To:* dev@lucene.apache.org
 *Subject:* composition of different queries based scores

 for a search string hello*~ how the scoring is calculated?

 as the formula given in the url:
 http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/search/Similarity.html,
 doesnot take into consideration of edit distance and prefix term
 corresponding factors into account.

 Does lucene add up the scores obtained from each type of query included
 i.e for the above query actual score=default scoring+1/(edit
 distance)+prefix match score ?, If so, there is no normalization between
 scores, else what is the approach lucene follows starting from seperating
 each query based identifiers like (~(edit distance), *(prefix query) etc)
 to actual scoring.







[jira] [Assigned] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-29 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-4574:


Assignee: David Smiley

 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
Assignee: David Smiley
 Attachments: LUCENE-4574.patch, LUCENE-4574.patch, 
 Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-29 Thread Dawid Weiss
 i dont see pom.xml in your source tree.

Instead of educating others about what's good and bad how about if you
take some more time studying the sources of Lucene/ Solr and its build
system? Your observations are superficial to say the least: POM files
are generated dynamically, the test infrastructure is among the more
sophisticated things to be found; with multiple CI systems running the
code all the time, the coverage is great across JVMs, the
randomization really brings up bugs nobody thought to cover manually.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org