date:20110316


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-2945:
-

Attachment: LUCENE-2945d.patch

Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. 
The static inner classes have been moved to package private classes, and their 
common function was moved to a new super class.

Also a few more test cases were added. Test cases for testing not equals might 
be still be added, but I don't see a real need to do that.

As this adds handling equals/hashcode and has hardly any redundancy, I think 
this is committable.


 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode


[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007381#comment-13007381
 ] 

Paul Elschot edited comment on LUCENE-2945 at 3/16/11 8:27 AM:
---

Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c patch. 
The static inner classes have been moved to package private classes, and their 
common function was moved to a new super class.

Also a few more test cases were added. Test cases for testing not equals might 
be still be added, but I don't see a real need to do that.

As this adds handling equals/hashcode and has hardly any redundancy, I think 
this is close to committable. The patch also deprecates a compare..() method, I 
don't know whether the comments there are to the point.


  was (Author: paul.elsc...@xs4all.nl):
Basically the 2945d patch of 16 March 2011 is a refactoring of the 2945c 
patch. The static inner classes have been moved to package private classes, and 
their common function was moved to a new super class.

Also a few more test cases were added. Test cases for testing not equals might 
be still be added, but I don't see a real need to do that.

As this adds handling equals/hashcode and has hardly any redundancy, I think 
this is committable.

  
 Surround Query doesn't properly handle equals/hashcode
 --

 Key: LUCENE-2945
 URL: https://issues.apache.org/jira/browse/LUCENE-2945
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1.1, 4.0

 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
 LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch


 In looking at using the surround queries with Solr, I am hitting issues 
 caused by collisions due to equals/hashcode not being implemented on the 
 anonymous inner classes that are created by things like DistanceQuery (branch 
 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2968) SurroundQuery doesn't support SpanNot


[ 
https://issues.apache.org/jira/browse/LUCENE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007385#comment-13007385
 ] 

Paul Elschot commented on LUCENE-2968:
--

SpanNot filters on no(t) overlap. Any idea for an operator name?
spn nov nto ... ?

 SurroundQuery doesn't support SpanNot
 -

 Key: LUCENE-2968
 URL: https://issues.apache.org/jira/browse/LUCENE-2968
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor

 It would be nice if we could do span not in the surround query, as they are 
 quite useful for keeping searches within a boundary (say a sentence)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2968) SurroundQuery doesn't support SpanNot


[ 
https://issues.apache.org/jira/browse/LUCENE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007388#comment-13007388
 ] 

Paul Elschot commented on LUCENE-2968:
--

This could also be an opportunity to port Surround to the new query parser in 
Lucene.

 SurroundQuery doesn't support SpanNot
 -

 Key: LUCENE-2968
 URL: https://issues.apache.org/jira/browse/LUCENE-2968
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor

 It would be nice if we could do span not in the surround query, as they are 
 quite useful for keeping searches within a boundary (say a sentence)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2430) Swapping cores with persistent switched on should save swapped core to defaultCoreName

2011-03-16 Thread bidorbuy (JIRA)

Swapping cores with persistent switched on should save swapped core to 
defaultCoreName
--

 Key: SOLR-2430
 URL: https://issues.apache.org/jira/browse/SOLR-2430
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.0
 Environment: CentOS
Reporter: bidorbuy


Running on the latest trunk version and configured multi-cores with persistent 
turned on and set a default-core. When swapping cores I would have expected 
default behavior to be that the swapped core name would be persisted as the new 
defaultCoreName. i.e. if switching from primary to staging, the defaultCoreName 
should be written to staging.

When swapping out cores (i.e. from primary to staging) and then restarting 
Jetty, Solr falls back to the current configured default-core (=primary) 
instead of the previously swapped one (=staging). If this is intended, can 
perhaps the swap command be extended to force rewritting Solr.xml

Current config file:
?xml version=1.0 encoding=UTF-8 ?
solr sharedLib=lib persistent=true
  cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary
core name=primary instanceDir=conf/primary/ 
dataDir=../../data/primary/
core name=staging instanceDir=conf/staging/ 
dataDir=../../data/staging/
  /cores
/solr



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2412) Multipath hierarchical faceting

2011-03-16 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007411#comment-13007411
 ] 

Toke Eskildsen commented on SOLR-2412:
--

The syntax for calling is kept close to SOLR-64 and SOLR-792. The essential 
commands are {{qt=exprhefacet=true}} to activate faceting, 
{{efacet.hierarchical=trueefacet.field=mypath}} for hierarchical.

Sorting is controlled with {{efacet.sort=count|index|locale}}. If locale is 
chosen, the locale is selected with {{efacet.sort.locale=da}}. The result set 
is limited with {{efacet.hierarchical.levels=99}} and {{efacet.limit=100}} to 
control the maximum depth and the maximum number of entries at each level.

Example:
{code}
http://localhost:8983/solr/select/?q=*:*rows=0fl=idindent=0nqt=exprhefacet=trueefacet.field=path_ssefacet.hierarchical=trueefacet.hierarchical.levels=99efacet.limit=10
{code}

{code}
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime204/int
/lst
result name=response numFound=100 start=0
  doc
str name=id1/str
  /doc
/result
lst name=efacet_counts
  lst name=efacet_fields
lst name=path_ss
  str name=fieldpath_ss/str
  lst name=paths
long name=recursivecount100/long
long name=potentialtags100/long
long name=totaltags101/long
long name=count101/long
int name=level0/int
lst name=sub
  lst name=L0_T1
int name=count1/int
lst name=sub
  long name=recursivecount9901/long
  long name=potentialtags9901/long
  long name=totaltags103/long
  long name=count103/long
  int name=level1/int
  lst name=sub
lst name=L1_T1
  int name=count1/int
  lst name=sub
long name=recursivecount97/long
long name=potentialtags97/long
long name=totaltags97/long
long name=count97/long
int name=level2/int
lst name=sub
  lst name=L2_T1
int name=count1/int
  /lst
...
{code}

I'm currently doing some performance (memory and speed) comparisons of SOLR-64, 
SOLR-792 and SOLR-2412, which will be added later.

 Multipath hierarchical faceting
 ---

 Key: SOLR-2412
 URL: https://issues.apache.org/jira/browse/SOLR-2412
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 4.0
 Environment: Fast IO when huge hierarchies are used
Reporter: Toke Eskildsen
  Labels: contrib, patch
 Attachments: SOLR-2412.patch


 Hierarchical faceting with slow startup, low memory overhead and fast 
 response. Distinguishing features as compared to SOLR-64 and SOLR-792 are
   * Multiple paths per document
   * Query-time analysis of the facet-field; no special requirements for 
 indexing besides retaining separator characters in the terms used for faceting
   * Optional custom sorting of tag values
   * Recursive counting of references to tags at all levels of the output
 This is a shell around LUCENE-2369, making it work with the Solr API. The 
 underlying principle is to reference terms by their ordinals and create an 
 index wide documents to tags map, augmented with a compressed representation 
 of hierarchical levels.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Facing Problem in making query for File Based Spell Checker

2011-03-16 Thread Saurabh Srivastava

Hello Guys,

I am facing problem in making query for file based spell checker.
Following is the class

lst name=spellchecker
  str name=classnamesolr.FileBasedSpellChecker/str
  str name=namefile/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./spellcheckerFile/str
/lst

I am not able to access spellings.txt File inspite of doing all the
configuration available on Solr Website. Please help me.


Regards
Saurabh Srivastava

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2430) Swapping cores with persistent switched on should save swapped core to defaultCoreName


[ 
https://issues.apache.org/jira/browse/SOLR-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007446#comment-13007446
 ] 

Mark Miller commented on SOLR-2430:
---

How about calling persist after call swap?

 Swapping cores with persistent switched on should save swapped core to 
 defaultCoreName
 --

 Key: SOLR-2430
 URL: https://issues.apache.org/jira/browse/SOLR-2430
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.0
 Environment: CentOS
Reporter: bidorbuy
  Labels: core, multicore

 Running on the latest trunk version and configured multi-cores with 
 persistent turned on and set a default-core. When swapping cores I would have 
 expected default behavior to be that the swapped core name would be persisted 
 as the new defaultCoreName. i.e. if switching from primary to staging, the 
 defaultCoreName should be written to staging.
 When swapping out cores (i.e. from primary to staging) and then restarting 
 Jetty, Solr falls back to the current configured default-core (=primary) 
 instead of the previously swapped one (=staging). If this is intended, can 
 perhaps the swap command be extended to force rewritting Solr.xml
 Current config file:
 ?xml version=1.0 encoding=UTF-8 ?
 solr sharedLib=lib persistent=true
   cores adminPath=/admin/cores shareSchema=true 
 defaultCoreName=primary
 core name=primary instanceDir=conf/primary/ 
 dataDir=../../data/primary/
 core name=staging instanceDir=conf/staging/ 
 dataDir=../../data/staging/
   /cores
 /solr

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations

SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
situations
---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0


in an application of mine, i experienced some very slow query times with finite 
automata (all the DFAs are acyclic)

It turned out, the slowdown is some terrible runtime in 
SpecialOperations.isFinite -- this is used to determine if the DFA is acyclic 
or not.

(in this case I am talking about even up to minutes of cpu).


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations


 [ 
https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2970:


Attachment: LUCENE-2970.patch

Attached is a patch: imagine a regexp with lots of optionals e.g. 
[abcd]e?f?[gh]a?b? ...

In this case the code is not linear in number of states... if we are at state 
A, and it has a transition to B, we determine that B is finite, then later if 
we are at C and it leads to B too, we need not determine if B is finite again, 
as we already did so. So, I keep visited for this.

Additionally I changed it to use a Bitset instead of a HashSet, which helps the 
speed (but just a constant-time speedup).

I took the old code, dumped it into AutomatonTestUtil as isFiniteSimple and 
the test just generates random automata and compares this versus the new 
implementation.

I'd appreciate any reviews/suggestions any automaton-hackers want to give here.


 SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
 situations
 ---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2970.patch


 in an application of mine, i experienced some very slow query times with 
 finite automata (all the DFAs are acyclic)
 It turned out, the slowdown is some terrible runtime in 
 SpecialOperations.isFinite -- this is used to determine if the DFA is 
 acyclic or not.
 (in this case I am talking about even up to minutes of cpu).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2430) Swapping cores with persistent switched on should save swapped core to defaultCoreName

2011-03-16 Thread bidorbuy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007457#comment-13007457
 ] 

bidorbuy commented on SOLR-2430:


I don't think this is necessary as solr.xml has persistent=true set.

Before the swap the admin interface shows:
cwd=/home/prodza/jetty SolrHome=/home/prodza/solr/conf/primary/ 

and the solr.xml looks like this:
-rw-rw-r-- 1 prodza prodza  348 Mar 11 22:19 solr.xml
[prodza@localhost solr]$ cat solr.xml
?xml version=1.0 encoding=UTF-8 ?
solr sharedLib=lib persistent=true
  cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary
core name=primary instanceDir=conf/primary/ 
dataDir=../../data/primary/
core name=staging instanceDir=conf/staging/ 
dataDir=../../data/staging/
  /cores
/solr

After the swap (from primary to staging) via: 
http://MYHOST:8983/solr/admin/cores?action=SWAPcore=primaryother=staging the 
admin-interface shows:
cwd=/home/prodza/jetty SolrHome=/home/prodza/solr/conf/staging/ 

The solr.xml has been updated (see filestamp):

-rw-rw-r-- 1 prodza prodza  348 Mar 11 22:26 solr.xml
[prodza@localhost solr]$ cat solr.xml
?xml version=1.0 encoding=UTF-8 ?
solr sharedLib=lib persistent=true
  cores adminPath=/admin/cores shareSchema=true defaultCoreName=primary
core name=primary instanceDir=conf/staging/ 
dataDir=../../data/staging/
core name=staging instanceDir=conf/primary/ 
dataDir=../../data/primary/
  /cores
/solr

And the solr-log shows:

2011-03-11 22:26:11,421  INFO [solr.core.CoreContainer] [qtp2026549-22] : 
swaped:  with staging
2011-03-11 22:26:11,421  INFO [solr.core.CoreContainer] [qtp2026549-22] : 
Persisting cores config to /home/prodza/solr/solr.xml


 Swapping cores with persistent switched on should save swapped core to 
 defaultCoreName
 --

 Key: SOLR-2430
 URL: https://issues.apache.org/jira/browse/SOLR-2430
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.0
 Environment: CentOS
Reporter: bidorbuy
  Labels: core, multicore

 Running on the latest trunk version and configured multi-cores with 
 persistent turned on and set a default-core. When swapping cores I would have 
 expected default behavior to be that the swapped core name would be persisted 
 as the new defaultCoreName. i.e. if switching from primary to staging, the 
 defaultCoreName should be written to staging.
 When swapping out cores (i.e. from primary to staging) and then restarting 
 Jetty, Solr falls back to the current configured default-core (=primary) 
 instead of the previously swapped one (=staging). If this is intended, can 
 perhaps the swap command be extended to force rewritting Solr.xml
 Current config file:
 ?xml version=1.0 encoding=UTF-8 ?
 solr sharedLib=lib persistent=true
   cores adminPath=/admin/cores shareSchema=true 
 defaultCoreName=primary
 core name=primary instanceDir=conf/primary/ 
 dataDir=../../data/primary/
 core name=staging instanceDir=conf/staging/ 
 dataDir=../../data/staging/
   /cores
 /solr

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-16 Thread Steven Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007482#comment-13007482
]

Steven Rowe commented on LUCENE-2960:
-

{quote}
bq. How about an IWC base class, extended by IWCinit and IWClive. IWCinit has
setters for everything, and IW.getConfig() returns IWClive, which has no
setters for things you can't set on the fly.

I tried to implement this, but couldn't figure out a way to avoid code and
javadoc duplication and/or separation for the live setters, which need to be on
both the init and live versions.
{quote}

An annotation processor that looks for @Live annotations on setters, then
generates source for a LiveIWC class, an instance of which would be returned by
IW.getConfig(), would solve the duplication/separation problem. No extension
required: LiveIWC could forward all getters and the live setters to a cloned
IWC.

Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
--

Key: LUCENE-2960
URL: https://issues.apache.org/jira/browse/LUCENE-2960
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Shay Banon
Priority: Blocker
Fix For: 3.1, 4.0

Attachments: LUCENE-2960.patch

In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk.
It would be great to be able to control that on a live IndexWriter. Other
possible two methods that would be great to bring back are
setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other
setters can actually be set on the MergePolicy itself, so no need for setters
for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations

2011-03-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007488#comment-13007488
 ] 

Michael McCandless commented on LUCENE-2970:


Patch looks correct to me!

The algo you impl'd is the same one described in Cormen, Leiserson, Rivest 
Algorithms book, as a side effect of doing a depth-first walk through the DFA.  
Their description of DFS colors the nodes -- white is unvisited, black is 
visited, gray is being visited (ie on my current path).  A DFA then has a 
cycle if every you recurse and find a gray node.

In your patch, the combination of path and visited maps to these colors, 
and you detect a cycle when path is set and visited is not.

Maybe rename the test-only isFiniteSimple to isFiniteSLOW or something?

Does the new random test case tend not to hit the super-slow cases...?

 SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
 situations
 ---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2970.patch


 in an application of mine, i experienced some very slow query times with 
 finite automata (all the DFAs are acyclic)
 It turned out, the slowdown is some terrible runtime in 
 SpecialOperations.isFinite -- this is used to determine if the DFA is 
 acyclic or not.
 (in this case I am talking about even up to minutes of cpu).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-03-16 Thread Toke Eskildsen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007497#comment-13007497
 ] 

Toke Eskildsen commented on SOLR-2403:
--

Dividing by shard count is fairly risky. An example could be the shards
{code}
Shard 1: A(9) B(6) C(10) D(8)
Shard 2: A(4) B(5) C(4) D(3)
{code}
where the request of the top-3 elements with mincount=5 from each shard would 
give the merged result
{code}
B(11) C(10)
{code}
where the correct result would be
{code}
A(13) B(11) C(14) D(11)
{code}

The problem with using mincount=1 for each shard-call is of course that the 
single shard result sets needs to be humongous in order to ensure that the 
correct values are returned, when the field contains many value with low count 
and few values with high count. With shards like
{code}
Shard 1: A(1) B(1) C(1) D(1) E(1) F(9) G(1) H(1)
Shard 2: A(1) B(1) C(1) D(1) E(1) F(1) G(1) H(10)
{code}
and a request for mincount=10, all terms must be returned from both shards in 
order to get the result
{code}
F(10) H(11)
{code}

As you, Yonik, point out, a variant of the problem exists when sorting on 
count. However, for count it is mitigated by the fact that the results from the 
individual shards are sorted by the selecting key (count). This means that the 
chance of missing or miscounting tags is low and can be lowered further by 
relatively little over-requesting.

With lexical sorting, the selecting key (count again) is independent of the 
sorting key. Over-requesting helps, but only linear to the fraction of the full 
result-set from each shard that is requested. Furthermore, the need for 
over-requesting grows with the number of shards as the overlapping hills can be 
smaller while still summing up to mincount.

I do not have any real solution for the problem. One minor improvement would be 
a collector that kept collecting terms with a mincount=y until limit=n or the 
number of collected terms with mincount=x was equal to m, where x is the 
original mincount and y is dependent on the number of shards. This would at 
least stop the collection process when the result set was guaranteed to contain 
enough values above the given threshold. It would work well with spikes but 
poorly with hills just below mincount x and it would still not guarantee 
correctness of the sums of the counts, only correctness of the terms.

 Problem with facet.sort=lex, shards, and facet.mincount
 ---

 Key: SOLR-2403
 URL: https://issues.apache.org/jira/browse/SOLR-2403
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
 Environment: RHEL5, Ubuntu 10.04
Reporter: Peter Cline

 I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 
 1.4.1.  I can if necessary and update.
 Solr is not returning the proper number of facet values when sorting 
 alphabetically, using distributed search, and using a facet.mincount that 
 excludes some of the values in the first facet.limit values.
 Easiest explained by example.  Sorting alphabetically, the first 20 values 
 for my subject_facet field have few documents.  19 facet values have only 1 
 document associated, and 1 has 2 documents.  There are plenty after that have 
 more than 2.
 {code}
 http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2
 {code}
 comes back with the expected 20 facet values with = 2 documents associated.
 If I add a shards parameter that points back to itself, the result is 
 different.
 {code}
 http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr
 {code}
 comes back with only 1 facet value: the single value in the first 20 that had 
 more than 1 document.  
 It appears to me that mincount is ignored when doing the original query to 
 the shards, then applied afterwards.
 Let me know if you need any more info.  
 Thanks,
 Peter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-03-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007508#comment-13007508
 ] 

Yonik Seeley commented on SOLR-2403:


bq. Dividing by shard count is fairly risky. 

Actually, it seems like it should help? (when mincount is relatively high at 
least).

Let's take your example of facet.mincount=10, facet.limit=2, facet.sort=index
{code}
Shard 1: A(1) B(1) C(1) D(1) E(1) F(9) G(1) H(1)
Shard 2: A(1) B(1) C(1) D(1) E(1) F(1) G(1) H(10)
{code}

mincount / nShards = 5, so the shard requests sent will be along the lines of
facet.mincount=5, facet.limit=5, facet.sort=index  (some over-requesting)
and we will get back
F(9), H(10)

The second phase (facet refinement... to true up counts) will retrieve counts 
from each shard for constraints in the list that it didn't return the first 
time.
So shard1 will be asked about H, and shard2 will be asked about F.

The final response will be F(10),H(11)

bq. Over-requesting helps, but only linear to the fraction of the full 
result-set from each shard that is requested.

Yes, I think you're correct that over-requesting is less useful for sort=index 
than sort=count.
Luckily, we can fix the mincount=1 problem and get exact answers for that case, 
which is the most important case.  I think mincount  1 is relatively rare.




 Problem with facet.sort=lex, shards, and facet.mincount
 ---

 Key: SOLR-2403
 URL: https://issues.apache.org/jira/browse/SOLR-2403
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
 Environment: RHEL5, Ubuntu 10.04
Reporter: Peter Cline

 I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 
 1.4.1.  I can if necessary and update.
 Solr is not returning the proper number of facet values when sorting 
 alphabetically, using distributed search, and using a facet.mincount that 
 excludes some of the values in the first facet.limit values.
 Easiest explained by example.  Sorting alphabetically, the first 20 values 
 for my subject_facet field have few documents.  19 facet values have only 1 
 document associated, and 1 has 2 documents.  There are plenty after that have 
 more than 2.
 {code}
 http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2
 {code}
 comes back with the expected 20 facet values with = 2 documents associated.
 If I add a shards parameter that points back to itself, the result is 
 different.
 {code}
 http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr
 {code}
 comes back with only 1 facet value: the single value in the first 20 that had 
 more than 1 document.  
 It appears to me that mincount is ignored when doing the original query to 
 the shards, then applied afterwards.
 Let me know if you need any more info.  
 Thanks,
 Peter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations


[ 
https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007511#comment-13007511
 ] 

Robert Muir commented on LUCENE-2970:
-

bq. A DFA then has a cycle if every you recurse and find a gray node

well it seems it might work for an NFA too? Though i'm not sure how great of 
NFAs AutomatonTestUtil.randomAutomaton generates?
if all else fails we can det as a side effect (this won't hurt lucene), but I'd 
like to know for sure, and to send the patch upstream.

{quote}
Maybe rename the test-only isFiniteSimple to isFiniteSLOW or something?

Does the new random test case tend not to hit the super-slow cases...?
{quote}

The test definitely got faster, but maybe the type of DFAs i generate are not 
represented fairly by the random generator? In other words they are 
worst-case for the old method, but they are reasonable as far as queries, 
finite and contained as far as the number of terms they accept.




 SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
 situations
 ---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2970.patch


 in an application of mine, i experienced some very slow query times with 
 finite automata (all the DFAs are acyclic)
 It turned out, the slowdown is some terrible runtime in 
 SpecialOperations.isFinite -- this is used to determine if the DFA is 
 acyclic or not.
 (in this case I am talking about even up to minutes of cpu).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-03-16 Thread Toke Eskildsen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007513#comment-13007513
]

Toke Eskildsen commented on SOLR-2403:
--

My first example was hills, while the second was spikes, where I agree that the
divide-mincount-by-shard# or something similar works well. As it comes down to
distribution of counts vs. mincount, we seem to be left with the unsatisfying
it depends, but avoid using mincounts around the average count-answer.

I forgot about the refinement phase. That would ensure that my suggestion of a
collector with two separate mincounts would return the correct result for
counts as well as terms, as long as it did not exceeded the given limits. Alas,
it still only helps somewhat and might not be worth the hassle.

Problem with facet.sort=lex, shards, and facet.mincount
---

Key: SOLR-2403
URL: https://issues.apache.org/jira/browse/SOLR-2403
Project: Solr
Issue Type: Bug
Components: search
Affects Versions: 4.0
Environment: RHEL5, Ubuntu 10.04
Reporter: Peter Cline

I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or
1.4.1. I can if necessary and update.
Solr is not returning the proper number of facet values when sorting
alphabetically, using distributed search, and using a facet.mincount that
excludes some of the values in the first facet.limit values.
Easiest explained by example. Sorting alphabetically, the first 20 values
for my subject_facet field have few documents. 19 facet values have only 1
document associated, and 1 has 2 documents. There are plenty after that have
more than 2.
{code}
http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2
{code}
comes back with the expected 20 facet values with = 2 documents associated.
If I add a shards parameter that points back to itself, the result is
different.
{code}
http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr
{code}
comes back with only 1 facet value: the single value in the first 20 that had
more than 1 document.
It appears to me that mincount is ignored when doing the original query to
the shards, then applied afterwards.
Let me know if you need any more info.
Thanks,
Peter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations

2011-03-16 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007517#comment-13007517
 ] 

Michael McCandless commented on LUCENE-2970:


bq. well it seems it might work for an NFA too?

Sorry, yes -- the algo doesn't care if it's N or D.  It works for both.

 SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
 situations
 ---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2970.patch


 in an application of mine, i experienced some very slow query times with 
 finite automata (all the DFAs are acyclic)
 It turned out, the slowdown is some terrible runtime in 
 SpecialOperations.isFinite -- this is used to determine if the DFA is 
 acyclic or not.
 (in this case I am talking about even up to minutes of cpu).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-16 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007516#comment-13007516
]

Ahmet Arslan commented on SOLR-1499:

Hi Lance,

I setup patch to latest trunk. It required some change though.
I pointed out a solr URL (version 1.4.0) to upgrade from 1.4.0 to trunk.

I received :

Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 1) or
the data in not in 'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)

What can be a work around to overcome this?

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Key: SOLR-1499
URL: https://issues.apache.org/jira/browse/SOLR-1499
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Lance Norskog
Assignee: Erik Hatcher
Fix For: Next

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch

The SolrEntityProcessor queries an external Solr instance. The Solr documents
returned are unpacked and emitted as DIH fields.
The SolrEntityProcessor uses the following attributes:
* solr='http://localhost:8983/solr/sms'
** This gives the URL of the target Solr instance.
*** Note: the connection to the target Solr uses the binary SolrJ format.
* query='Jeffersonsort=id+asc'
** This gives the base query string use with Solr. It can include any
standard Solr request parameter. This attribute is processed under the
variable resolution rules and can be driven in an inner stage of the indexing
pipeline.
* rows='10'
** This gives the number of rows to fetch per request..
** The SolrEntityProcessor always fetches every document that matches the
request..
* fields='id,tag'
** This selects the fields to be returned from the Solr request.
** These must also be declared as field elements.
** As with all fields, template processors can be used to alter the contents
to be passed downwards.
* timeout='30'
** This limits the query to 5 seconds. This can be used as a fail-safe to
prevent the indexing session from freezing up. By default the timeout is 5
minutes.
Limitations:
* Solr errors are not handled correctly.
* Loop control constructs have not been tested.
* Multi-valued returned fields have not been tested.
The unit tests give examples of how to use it as the root entity and an inner
entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations


[ 
https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007524#comment-13007524
 ] 

Robert Muir commented on LUCENE-2970:
-

Ok, i feel better now.

I think i have an explanation why the test doesn't hang.
I think its because the automata we generate are pretty damn small (might are 
significantly larger).
I think for our testing this is just fine, and actually desirable, as it helps 
debugging.

The only largeish automata lucene tests through this stuff are for levenshtein, 
and we supply 'true' 
here (since we know its finite) and avoid this method entirely... and even 
those are special in that
they always have the same general shape


 SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
 situations
 ---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2970.patch


 in an application of mine, i experienced some very slow query times with 
 finite automata (all the DFAs are acyclic)
 It turned out, the slowdown is some terrible runtime in 
 SpecialOperations.isFinite -- this is used to determine if the DFA is 
 acyclic or not.
 (in this case I am talking about even up to minutes of cpu).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-16 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007528#comment-13007528
]

Michael McCandless commented on LUCENE-2960:

{quote}
bq. Oh yeah. But then we'd clone the full IWC on every set... this seems like
overkill in the name of purity.

So what? What exactly is overkill? Few wasted bytes and CPU ns for an object
that's created a couple of times during application lifetime?
There are also builders, which are very similar to what Steven is proposing.
{quote}

I don't like that this'd be an O(N*M) cost API when you use it. Sure,
N and M are tiny, and you use this API very rarely, but I still don't
like it ;)

In addition, because this is all in the name of purity which as far
as I can see has no real value besides purity. It's... incestuous.
And, I'm a pragmatist, I guess.

{quote}
An annotation processor that looks for @Live annotations on setters, then
generates source for a LiveIWC class, an instance of which would be returned by
IW.getConfig(), would solve the duplication/separation problem. No extension
required: LiveIWC could forward all getters and the live setters to a cloned
IWC.
{quote}

I think this is overkill? (Ie to have @Live compile to LiveIWC vs
InitIWC). Though, @Live would be nice for jdocs?

bq. You win the fact that this is such an expert thing, and it should not
confuse 99% of users who won't need to change these settings in a live way.

Right -- simple things should be simple and complex things should be
possible.

The current patch achieves this -- the 99% of simple users that just
want to config IW and create it find all of the config in one place.
The 1% complex cases (need to change live settings) are able to do so,
but must read the jdocs for each setter to verify it's supported. The
API should be designed around the simple users not the complex ones,
as the current patch does.

So... I think the current patch is ready to commit (except, I'll
remove the /html tag for infoStream defaultInfoStream).

Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
--

Key: LUCENE-2960
URL: https://issues.apache.org/jira/browse/LUCENE-2960
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Shay Banon
Priority: Blocker
Fix For: 3.1, 4.0

Attachments: LUCENE-2960.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FieldType API change proposal -- SOLR-2423

2011-03-16 Thread Ryan McKinley

any concerns with this proposal?  If not, i would like to commit soon.

After 3.1 is released, i would merge with 3.x branch and add a deprecation.



On Mon, Mar 14, 2011 at 12:57 PM, Ryan McKinley ryan...@gmail.com wrote:
 the default implementation would just use toString()

 For things that could use the type directly (Date/Numbers) they check
 instacneof.

 This is actually identical to what currently happens in
 DocumentBuilder, but would happen in the FieldType and would not check
 everything if it is:
 1. instaceof BinaryField
 2. instanceof Date

 ryan


 On Mon, Mar 14, 2011 at 12:52 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Mon, Mar 14, 2011 at 12:45 PM, Ryan McKinley ryan...@gmail.com wrote:
 Any opinions on this?

 I've been focused on getting this 3.1 release out (reviewing/fixing
 docs, packaging, etc).
 I'm not sure about Object... does that mean most FieldTypes would be
 doing instanceof checks?

 -Yonik
 http://lucidimagination.com


 thanks
 ryan


 On Sat, Mar 12, 2011 at 2:29 AM, Ryan McKinley ryan...@gmail.com wrote:
 I think FieldType should take an Object input rather then String --
 this gives FieldTypes the option of using (and reusing) explicit types
 in addition to String.  For embedded apps that fill SolrInputDocuments
 with real objects, the fields can use objects directly -- this means
 that Date does not have to get converted to a String and then back to
 a Date.

 This is a major API change, but I think the value is worth the trouble.

 Thoughts?

 ryan


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

[
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007537#comment-13007537
]

Mark Miller commented on LUCENE-2960:
-

{quote}
The current patch achieves this – the 99% of simple users that just
want to config IW and create it find all of the config in one place.
The 1% complex cases (need to change live settings) are able to do so,
but must read the jdocs for each setter to verify it's supported.
{quote}

The proposed alternatives sound just as good as this? In the proposed
compromises, the 99% of simple users will see have one place to config IW as
well for the avg 'set up front' use case. Perhaps the complex users could also
have an API with a better pattern and it doesn't have to be either or as you
seem to lead...

An IWC that is 'partially' live and can be changed externally after passing to
the IW is just an inferior pattern plain and simple, and doesn't gain you much.

{quote}
The
API should be designed around the simple users not the complex ones,
as the current patch does.
{quote}

This really depends. If the simple users can be satisfied, and the API can also
be decent for complex users, win/win.

I guess I would place my bets that there will not be a ton of deprecations
loops of settings wanting to be live.

Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
--

Key: LUCENE-2960
URL: https://issues.apache.org/jira/browse/LUCENE-2960
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Shay Banon
Priority: Blocker
Fix For: 3.1, 4.0

Attachments: LUCENE-2960.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007543#comment-13007543
 ] 

Mark Miller commented on LUCENE-2960:
-

Though don't take that I don't agree as a hold up to finishing 3.1.

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2970) SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain situations


 [ 
https://issues.apache.org/jira/browse/LUCENE-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2970.
-

Resolution: Fixed

Committed revision 1082200.

Thanks for the review Mike!

 SpecialOperations.isFinite can have TERRIBLE TERRIBLE runtime in certain 
 situations
 ---

 Key: LUCENE-2970
 URL: https://issues.apache.org/jira/browse/LUCENE-2970
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2970.patch


 in an application of mine, i experienced some very slow query times with 
 finite automata (all the DFAs are acyclic)
 It turned out, the slowdown is some terrible runtime in 
 SpecialOperations.isFinite -- this is used to determine if the DFA is 
 acyclic or not.
 (in this case I am talking about even up to minutes of cpu).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-1822) SEVERE: Unable to move index file from: tempfile to: indexfile


 [ 
https://issues.apache.org/jira/browse/SOLR-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-1822.
---

   Resolution: Duplicate
Fix Version/s: (was: Next)
   4.0
   3.1
 Assignee: Mark Miller

 SEVERE: Unable to move index file from: tempfile to: indexfile
 --

 Key: SOLR-1822
 URL: https://issues.apache.org/jira/browse/SOLR-1822
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: Linux, JDK6,SOLR 1.4
Reporter: wyhw whon
Assignee: Mark Miller
Priority: Critical
 Fix For: 3.1, 4.0

 Attachments: SnapPuller.patch


 SOLR index directory remvoed,but do not know what the reasons for this.
 I add some codes on SnapPuller.java 577 line can reslove this bug.
 line 576   
 File indexFileInIndex = new File(indexDir, fname);
 +
 if (!indexDir.exists()) indexDir.mkdir();
 boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2011-03-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007627#comment-13007627
 ] 

Grant Ingersoll commented on SOLR-1725:
---

bq. As time passes, the case for moving to Java 6 increases.

Solr trunk is on 1.6.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
 Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Licenses files, Notice files and LUCENE-2952

2011-03-16 Thread Grant Ingersoll

As Robert can no doubt attest, we often scramble to make sure i's are dotted 
and t's are crossed when it comes to filling out LICENSE.txt and NOTICE.txt 
right before releases, thereby burdening the RM with way too much work in 
validating what dependency has which license.  Thus, we've been working to 
resolve this.

In prep for the landing of LUCENE-2952 and to make life easier on release 
managers going forward, we've adopted the following conventions for dealing 
with licenses:

1. For every dependency (i.e. jar file), there needs to be a corresponding 
file-LICENSE-LICENSE_TYPE.txt file, as in: foo-2.3.1.jar has the 
corresponding foo-LICENSE-BSD.txt file (assuming foo is BSD licensed) in the 
same directory as the jar file.

2.  _IF_ the license requires a NOTICE entry, then there must be a file of the 
name file-NOTICE.txt, as in foo-NOTICE.txt.

Failing to meet either one will break the build once L-2952 is committed (which 
should be soon for trunk and will be backported to 3.2).

Consider yourself notified.  

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr Cell DataImport Tika handler broken - fails to index Zip file contents

2011-03-16 Thread Chris Hostetter


: I had raised a jira for the Data Import handler part with the patch
: and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.
: The same fix is needed for the Solr Cell as well.
: 
: I can raise a jira and provide the patch for the same, if the patch
: seems good enough.

Jayendra: I'm not ery familiar with your patch (or tika!) but by all means 
please open an jira for the bug, even if you are hesitant to work on a 
patch ... if you mention the issues in the comments for one another, 
people will see that they are related.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-16 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007671#comment-13007671
 ] 

Steven Rowe commented on LUCENE-2960:
-

bq. Though don't take that I don't agree as a hold up to finishing 3.1.

+1

 Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
 --

 Key: LUCENE-2960
 URL: https://issues.apache.org/jira/browse/LUCENE-2960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shay Banon
Priority: Blocker
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2960.patch


 In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
 It would be great to be able to control that on a live IndexWriter. Other 
 possible two methods that would be great to bring back are 
 setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
 setters can actually be set on the MergePolicy itself, so no need for setters 
 for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2382) DIH Cache Improvements

2011-03-16 Thread James Dyer (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Dyer updated SOLR-2382:
-

Attachment: SOLR-2382.patch

Updated patch with 2 fixes for things I missed when porting this from 1.4.1 to
Trunk. Also added 1 more unit test.

I think this is ready for someone else to evaluate if anyone has the time
desire. I do believe this would be a nice addition to the DIH product.

DIH Cache Improvements
--

Key: SOLR-2382
URL: https://issues.apache.org/jira/browse/SOLR-2382
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch

Functionality:
1. Provide a pluggable caching framework for DIH so that users can choose a
cache implementation that best suits their data and application.

2. Provide a means to temporarily cache a child Entity's data without
needing to create a special cached implementation of the Entity Processor
(such as CachedSqlEntityProcessor).

3. Provide a means to write the final (root entity) DIH output to a cache
rather than to Solr. Then provide a way for a subsequent DIH call to use the
cache as an Entity input. Also provide the ability to do delta updates on
such persistent caches.

4. Provide the ability to partition data across multiple caches that can
then be fed back into DIH and indexed either to varying Solr Shards, or to
the same Core in parallel.
Use Cases:
1. We needed a flexible scalable way to temporarily cache child-entity
data prior to joining to parent entities.
- Using SqlEntityProcessor with Child Entities can cause an n+1 select
problem.
- CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching
mechanism and does not scale.
- There is no way to cache non-SQL inputs (ex: flat files, xml, etc).

2. We needed the ability to gather data from long-running entities by a
process that runs separate from our main indexing process.

3. We wanted the ability to do a delta import of only the entities that
changed.
- Lucene/Solr requires entire documents to be re-indexed, even if only a
few fields changed.
- Our data comes from 50+ complex sql queries and/or flat files.
- We do not want to incur overhead re-gathering all of this data if only 1
entity's data changed.
- Persistent DIH caches solve this problem.

4. We want the ability to index several documents in parallel (using 1.4.1,
which did not have the threads parameter).

5. In the future, we may need to use Shards, creating a need to easily
partition our source data into Shards.
Implementation Details:
1. De-couple EntityProcessorBase from caching.
- Created a new interface, DIHCache two implementations:
- SortedMapBackedCache - An in-memory cache, used as default with
CachedSqlEntityProcessor (now deprecated).
- BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested
with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.
I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar,
so to use or evaluate this patch, download bdb-je from
http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html

2. Allow Entity Processors to take a cacheImpl parameter to cause the
entity data to be cached (see EntityProcessorBase DIHCacheProperties).

3. Partially De-couple SolrWriter from DocBuilder
- Created a new interface DIHWriter, two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

4. Create a new Entity Processor, DIHCacheProcessor, which reads a
persistent Cache as DIH Entity Input.

5. Support a partition parameter with both DIHCacheWriter and
DIHCacheProcessor to allow for easy partitioning of source entity data.

6. Change the semantics of entity.destroy()
- Previously, it was being called on each iteration of
DocBuilder.buildDocument().
- Now it is does one-time cleanup tasks (like closing or deleting a
disk-backed cache) once the entity processor is completed.
- The only out-of-the-box entity processor that previously implemented
destroy() was LineEntitiyProcessor, so this is not a very invasive change.
General Notes:
We are near completion in converting our search functionality from a legacy
search engine to Solr. However, I found that DIH did not support caching to
the level of our prior product's data import utility. In order to get our
data into Solr, I created these caching enhancements. Because I believe this
has broad application, and because we would like this feature

Re: Solr Cell DataImport Tika handler broken - fails to index Zip file contents

2011-03-16 Thread Jayendra Patil

Thanks Chris,

I have already opened a jira
https://issues.apache.org/jira/browse/SOLR-2416 for the issue with the
attached patch.

Regards,
Jayendra

On Wed, Mar 16, 2011 at 3:57 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I had raised a jira for the Data Import handler part with the patch
 : and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.
 : The same fix is needed for the Solr Cell as well.
 :
 : I can raise a jira and provide the patch for the same, if the patch
 : seems good enough.

 Jayendra: I'm not ery familiar with your patch (or tika!) but by all means
 please open an jira for the bug, even if you are hesitant to work on a
 patch ... if you mention the issues in the comments for one another,
 people will see that they are related.


 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version

2011-03-16 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007688#comment-13007688
 ] 

Hoss Man commented on SOLR-2415:


i'm with ryan.

if it had always been respversion or something i wouldn't mind, and would 
encourage other response writers to use it for their own versioning purposes 
(ie: the json writer could have change the default for json.nl based on 
version).  but version is just so damn generic, it's really hard to have any 
idea what it's taking about.  (even xml.version is ambiguious: is it the format 
coming in, or going out?

I'd suggest either adding wt.version or wt.xml.version (depending on how 
people feel about the idea that it should/can be reused by all response writers 
in their own way) to 3.x with a fallback to using version if it's specified and 
mark version deprecated ... then remove it completley at a much later date 
(maybe 4.0, depends on when it comes out and how many 3.x releases come first)



 Change XMLWriter version parameter to wt.xml.version
 --

 Key: SOLR-2415
 URL: https://issues.apache.org/jira/browse/SOLR-2415
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial
 Fix For: 4.0


 The XMLWriter has a parameter called 'version'.  This controls some specifics 
 about how the XMLWriter works.  Using the parameter name 'version' made sense 
 back when the XMLWriter was the only option, but with all the various writers 
 and different places where 'version' makes sense, I think we should change 
 this parameter name to wt.xml.version so that it specifically refers to the 
 XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-2399) Solr Admin Interface, reworked

2011-03-16 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007691#comment-13007691
 ] 

Stefan Matheis (steffkes) edited comment on SOLR-2399 at 3/16/11 9:08 PM:
--

A bigger step compared with those one we had yet, we are talking about the 
*Schema-Browser* :

* [The current one 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_cur.png] needs 
much space (especially for the navigation)
* [The new one 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser.png] tries to 
put the focus more on details  information
* [The new Field/Dynamic Field/Type-Selection 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_nav.png] is 
diplayed in a simple Dropdown, which offers Keyboard-Navigation for Quick-Access
* [The list of relations 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_var.png] 
depends  on the selected F/DF/T

Just to say:
* CopyField's are also displayed in the area below the Selection, if defined.
* Every F/DF/T is clickable, linked with his Detail-Page

(Explicit) Questions for the Community:
* TopTerms are actually limited to 50, is that enough? Or is there a need to 
browse _all_ TopTerms?
* Analyzers-Detail, hide it for default - with a Toggle-Button (like it is 
actually)?
* Analyzers-Detail, Presentation okay - or needs to much space?
* F/DF/T-Selection, actually there is no possibilty to filter (like f.e. in 
iTunes; additional field, start typing and the list is restricted) - would that 
help for those of us, that have a lot of fields?

  was (Author: steffkes):
A bigger step compared with those one we had yet, we are talking about the 
*Schema-Browser* :

* [The current one 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_cur.png] needs 
much space (especially for the navigation)
* [The new one 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser.png] tries to 
put the focus more on details  information
* [The new Field/Dynamic Field/Type-Selection 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_nav.png] is 
diplay in a simple Dropdown, which offers Keyboard-Navigation for Quick-Access
* [The list of relations 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_var.png] 
depends  on the selected F/DF/T

Just to say:
* CopyField's are also displayed in the area below the Selection, if defined.
* Every F/DF/T is clickable, linked with his Detail-Page

(Explicit) Questions for the Community:
* TopTerms are actually limited to 50, is that enough? Or is there a need to 
browse _all_ TopTerms?
* Analyzers-Detail, hide it for default - with a Toggle-Button (like it is 
actually)?
* Analyzers-Detail, Presentation okay - or needs to much space?
* F/DF/T-Selection, actually there is no possibilty to filter (like f.e. in 
iTunes; additional field, start typing and the list is restricted) - would that 
help for those of us, that have a lot of fields?
  
 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor

 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 [This commit shows the 
 differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
  between old/existing index.jsp and my new one (which is could 
 copy-cut/paste'd from the existing one).
 Main Action takes place in 
 [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
  which is actually neither clean nor pretty .. just work-in-progress.
 Actually it's Work in Progress, so ... give it a try. It's developed with 
 Firefox as Browser, so, for a first impression .. please don't use _things_ 
 like Internet Explorer or so ;o
 Jan already suggested a bunch of good things, i'm sure there are more ideas 
 over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2399) Solr Admin Interface, reworked

2011-03-16 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007691#comment-13007691
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

A bigger step compared with those one we had yet, we are talking about the 
*Schema-Browser* :

* [The current one 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_cur.png] needs 
much space (especially for the navigation)
* [The new one 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser.png] tries to 
put the focus more on details  information
* [The new Field/Dynamic Field/Type-Selection 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_nav.png] is 
diplay in a simple Dropdown, which offers Keyboard-Navigation for Quick-Access
* [The list of relations 
(screenshot)|http://files.mathe.is/solr-admin/06_schema-browser_var.png] 
depends  on the selected F/DF/T

Just to say:
* CopyField's are also displayed in the area below the Selection, if defined.
* Every F/DF/T is clickable, linked with his Detail-Page

(Explicit) Questions for the Community:
* TopTerms are actually limited to 50, is that enough? Or is there a need to 
browse _all_ TopTerms?
* Analyzers-Detail, hide it for default - with a Toggle-Button (like it is 
actually)?
* Analyzers-Detail, Presentation okay - or needs to much space?
* F/DF/T-Selection, actually there is no possibilty to filter (like f.e. in 
iTunes; additional field, start typing and the list is restricted) - would that 
help for those of us, that have a lot of fields?

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor

 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin 
 [This commit shows the 
 differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
  between old/existing index.jsp and my new one (which is could 
 copy-cut/paste'd from the existing one).
 Main Action takes place in 
 [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
  which is actually neither clean nor pretty .. just work-in-progress.
 Actually it's Work in Progress, so ... give it a try. It's developed with 
 Firefox as Browser, so, for a first impression .. please don't use _things_ 
 like Internet Explorer or so ;o
 Jan already suggested a bunch of good things, i'm sure there are more ideas 
 over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version

2011-03-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007695#comment-13007695
 ] 

Yonik Seeley commented on SOLR-2415:


On a highly related question, how should we handle the desire to change the 
faceting format (to make it easier to add metadata like total number of 
constraints, etc)?  version would be one way.  facet.format would be 
another way.


 Change XMLWriter version parameter to wt.xml.version
 --

 Key: SOLR-2415
 URL: https://issues.apache.org/jira/browse/SOLR-2415
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial
 Fix For: 4.0


 The XMLWriter has a parameter called 'version'.  This controls some specifics 
 about how the XMLWriter works.  Using the parameter name 'version' made sense 
 back when the XMLWriter was the only option, but with all the various writers 
 and different places where 'version' makes sense, I think we should change 
 this parameter name to wt.xml.version so that it specifically refers to the 
 XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] Commented: (LUCENENET-399) Port changes from Java Lucene 2.9.3 and 2.9.4 releases

2011-03-16 Thread Digy (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007698#comment-13007698
 ] 

Digy commented on LUCENENET-399:


Current status of my local work:
* All core files for 2.9.2 - 2.9.4 transition are ported.
  16 modified/added test files are still waiting to be fixed under 
Lucene.Net.Index + Lucene.Net.Store
* 12 test cases under Lucene.Net.Index  1 case under Lucene.Net.Util fail 
(better to see after remaining test files are ported).

So, What do you think? Should I commit this huge patch or wait till everything 
is completed?

DIGY.

 Port changes from Java Lucene 2.9.3 and 2.9.4 releases
 --

 Key: LUCENENET-399
 URL: https://issues.apache.org/jira/browse/LUCENENET-399
 Project: Lucene.Net
  Issue Type: Task
  Components: Lucene.Net Core, Lucene.Net Test
Reporter: Troy Howard
Assignee: Scott Lombard
 Fix For: Lucene.Net 2.9.4

  Time Spent: 2h
  Remaining Estimate: 40h

 Port changes from Java Lucene 2.9.3 and 2.9.4 releases. 
 The Lucene.Net 2.9.4 release will roll up the changes from both of those 
 releases into one. 
 Unit tests should be added or updated to reflect the changes. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (SOLR-2251) use facet key as override for field name when looking for per field facet options

2011-03-16 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2251.


Resolution: Duplicate

I just realized this is actually a dup of SOLR-1351

 use facet key as override for field name when looking for per field facet 
 options
 ---

 Key: SOLR-2251
 URL: https://issues.apache.org/jira/browse/SOLR-2251
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4.1
Reporter: Tim
Priority: Minor

 The key parameter that is used for aliasing output is very helpful in 
 simplifying the readability of complex facets.  However it doesn't seem that 
 this same alias can be used when configuring facets of individual fields.  
 The following example that does not use the key parameter works fine under 
 1.4.1:
 rows=0q=*:*+NOT+customers.blocked:1facet=truef.customers_name.facet.mincount=2facet.field=customers_name
 lst name=customers_name
   int name=jone2/int
 /lst
 The example below also works and does use the key parameter, however note 
 that we're still using the original field name when referring to 
 f.customers_name.facet.mincount:
 rows=0q=*:*+NOT+customers.blocked:1facet=truef.customers_name.facet.mincount=2facet.field={!key=alt_name}customers_name
 lst name=customers_name
   int name=jone2/int
 /lst
 The final example below does not work.  It uses the alias established by the 
 key parameter to configure the mincount setting for the customers_name field.
 rows=0q=*:*+NOT+customers.blocked:1facet=truef.alt_name.facet.mincount=2facet.field={!key=alt_name}customers_name
 lst name=alt_name
   int name=jone2/int
   int name=tim1/int
   int name=sami0/int
 /lst
 This is a trivial example.  The behavior becomes much more important when 
 talking about facet queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-16 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007700#comment-13007700
]

Ahmet Arslan commented on SOLR-1499:

Eric,

Thanks for the pointer. As you said when I use

new CommonsHttpSolrServer(new URL(http://solr1.4.0Instance:8080/solr;), null,
new XMLResponseParser(), false);

I was able to communicate to solr 1.4.0 instance using solrj-trunk.

Do you recommend modifying this patch in this manner? Any performance hits?

Plus, What do you think about copy-pasting JavaBinCodec.java from source
version to destination version and Using a custom BinaryResponseParser that
uses that copy-paste class? Seems working for 1.4.0 to trunk.

Or should i stick with writing a little script to do it?

P.S. I am just trying to use a feature that will be already maintained by solr
commnunity.

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1823) QueryParser with new features for Lucene 3

2011-03-16 Thread Adriano Crestani (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007737#comment-13007737
]

Adriano Crestani commented on LUCENE-1823:
--

Hi Robert,

I completely agree with your statement, the config API scares me also. I would
love to submit a patch for it, but I am working for IBM now, and, as a
committer, I need to go through some bureaucratic paperwork before doing any
new feature for Lucene and it might still take some time :(

I had a better idea, I will propose it to be a GSOC project for this year. This
way we can also get one more contributor to contrib QP.

QueryParser with new features for Lucene 3
--

Key: LUCENE-1823
URL: https://issues.apache.org/jira/browse/LUCENE-1823
Project: Lucene - Java
Issue Type: New Feature
Components: QueryParser
Reporter: Michael Busch
Assignee: Luis Alves
Priority: Minor
Fix For: 4.0

Attachments: lucene_1823_any_opaque_precedence_fuzzybug_v2.patch,
lucene_1823_foo_bug_08_26_2009.patch

I'd like to have a new QueryParser implementation in Lucene 3.1, ideally
based on the new QP framework in contrib. It should share as much code as
possible with the current StandardQueryParser implementation for easy
maintainability.
Wish list (feel free to extend):
1. *Operator precedence*: Support operator precedence for boolean operators
2. *Opaque terms*: Ability to plugin an external parser for certain syntax
extensions, e.g. XML query terms
3. *Improved RangeQuery syntax*: Use more intuitive =, =, = instead of []
and {}
4. *Support for trierange queries*: See LUCENE-1768
5. *Complex phrases*: See LUCENE-1486
6. *ANY operator*: E.g. (a b c d) ANY 3 should match if 3 of the 4 terms
occur in the same document
7. *New syntax for Span queries*: I think the surround parser supports this?
8. *Escaped wildcards*: See LUCENE-588

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6021 - Failure

2011-03-16 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6021/

1 tests failed.
FAILED:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
expected:2 but was:3

Stack Trace:
junit.framework.AssertionFailedError: expected:2 but was:3
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1214)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1146)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:208)




Build Log (for compile errors):
[...truncated 8570 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr Config XML DTD's

2011-03-16 Thread Daniel Talsky

Hi, this is my first post to the mailing list.  I'm working on a commercial
implementation of a Solr project and would like to share some of my work,
although it's not really much.

I wrote a halting DTD for the Solr config file queryElevation.xml and would
like to eventually write a DTD for the config file.  Who do I need to talk
to about reviewing my work and perhaps getting a little help.

My DTD works for our internal version of queryElevation.xml, but since the
ATTRIB name of the doc/ tag could be anything, I'm not sure how to write a
DTD that would validate any valid query elevation file.

Anyway, thanks.  I put pressure on our company to redo our customer facing
search using Solr.  It launches soon and I've impressed everyone all the way
up to the CEO most of the credit goes to the Solr and Lucene devs for
making it so easy on me.

Daniel

[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version

2011-03-16 Thread Ryan McKinley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007805#comment-13007805
]

Ryan McKinley commented on SOLR-2415:
-

I see two approaches to the general problem:
1. each component gets its own version (wt.xml.version, facet.version,
hl.version, etc)
2. a single 'version' param that multiple components use.

I think option #2 makes more sense, perhaps we should add a getVersion()
parameter on SolrQueryRequest and have that used across all components.

For facet format (SOLR-2242) this should work, but I also hope that major
versions (4.0 etc) can drop old formats since maintaining these for a long time
can be a PIA.

Change XMLWriter version parameter to wt.xml.version
--

Key: SOLR-2415
URL: https://issues.apache.org/jira/browse/SOLR-2415
Project: Solr
Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial
Fix For: 4.0

The XMLWriter has a parameter called 'version'. This controls some specifics
about how the XMLWriter works. Using the parameter name 'version' made sense
back when the XMLWriter was the only option, but with all the various writers
and different places where 'version' makes sense, I think we should change
this parameter name to wt.xml.version so that it specifically refers to the
XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2415) Change XMLWriter version parameter to wt.xml.version

2011-03-16 Thread Chris A. Mattmann (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007814#comment-13007814
 ] 

Chris A. Mattmann commented on SOLR-2415:
-

At the rate of release cycles on this project, I'd seriously recommend against 
actually specifying versions, and fallbacks, etc., specifically for response 
writers other than the existing Solr version. Look at how long the existing 
response writers have hung around in their current format, independent of the 
version # changes (1.2, 1.3, 1.4, and now 3.1). In all of these cases, you 
simply could keep docs that say 1.2 is compatible (forwards) with 1.x, etc., 
and 3.x is compatible (backwards) with 1.x, etc.

 Change XMLWriter version parameter to wt.xml.version
 --

 Key: SOLR-2415
 URL: https://issues.apache.org/jira/browse/SOLR-2415
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Trivial
 Fix For: 4.0


 The XMLWriter has a parameter called 'version'.  This controls some specifics 
 about how the XMLWriter works.  Using the parameter name 'version' made sense 
 back when the XMLWriter was the only option, but with all the various writers 
 and different places where 'version' makes sense, I think we should change 
 this parameter name to wt.xml.version so that it specifically refers to the 
 XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2399) Solr Admin Interface, reworked

[
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007824#comment-13007824
]

Mark Miller commented on SOLR-2399:
---

Hey Stefan,

I had seen this issue in passing, but had not yet taken a closer look...

Fantastic stuff! I think this is a sorely needed face lift, and your screen
shots look like a brilliant upgrade. Really nice to see some effort put into
this area of Solr.

Solr Admin Interface, reworked
--

Key: SOLR-2399
URL: https://issues.apache.org/jira/browse/SOLR-2399
Project: Solr
Issue Type: Improvement
Components: web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor

*The idea was to create a new, fresh (and hopefully clean) Solr Admin
Interface.* [Based on this
[ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
I've quickly created a Github-Repository (Just for me, to keep track of the
changes)
» https://github.com/steffkes/solr-admin
[This commit shows the
differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
between old/existing index.jsp and my new one (which is could
copy-cut/paste'd from the existing one).
Main Action takes place in
[js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
which is actually neither clean nor pretty .. just work-in-progress.
Actually it's Work in Progress, so ... give it a try. It's developed with
Firefox as Browser, so, for a first impression .. please don't use _things_
like Internet Explorer or so ;o
Jan already suggested a bunch of good things, i'm sure there are more ideas
over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2399) Solr Admin Interface, reworked