date:20101205

On Sun, Dec 5, 2010 at 1:50 AM, Andi Vajda va...@apache.org wrote:

 With the recent releases of Lucene Java 2.9.4 and 3.0.3, the PyLucene
 2.9.4-1 and 3.0.3-1 releases closely tracking them are ready.

 Release candidates are available from:

    http://people.apache.org/~vajda/staging_area/

 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_2_9/CHANGES
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_0/CHANGES

 All versions of PyLucene are built with the same version of JCC, currently
 version 2.7, included in these release artifacts.

 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/CHANGES.txt
 http://svn.apache.org/repos/asf/lucene/java/branches/lucene_3_0/CHANGES.txt

 Please vote to release these artifacts as PyLucene 2.9.4-1 and 3.0.3-1.


+1, everything looks in order, building pylucene and running 'make
test' seemed fine on both versions.

Exception in migrating from 2.9.x to 3.0.2 on Android

2010-12-05 Thread DM Smith

The current code that works on Android with 2.9.1, but fails with 3.0.2:

Directory dir = FSDirectory.open(file);
...
do something with directory
...

The error we're seeing is:
12-04 21:34:41.629: WARN/System.err(23160): java.lang.NoClassDefFoundError: 
java.lang.management.ManagementFactory
12-04 21:34:41.639: WARN/System.err(23160): at 
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:87)
12-04 21:34:41.639: WARN/System.err(23160): at 
org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:142)
12-04 21:34:41.649: WARN/System.err(23160): at 
org.apache.lucene.store.Directory.makeLock(Directory.java:106)
12-04 21:34:41.649: WARN/System.err(23160): at 
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058)

Turns out Android does not have java.lang.management.ManagementFactory. 

There are several work arounds in client code, but not sure what is best.

The bigger question is whether and how Lucene should be modified to accommodate?

Ultimately FSDirectory.open does the following:
if (Constants.WINDOWS) {
  return new SimpleFSDirectory(path, lockFactory);
} else {
  return new NIOFSDirectory(path, lockFactory);
}

Should Android be a supported client OS?

If so, wouldn't it be better not to have OS specific if-then-else and use 
reflection or something else?

Thanks,
DM
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android

2010-12-05 Thread Gérard Dupont

On 5 December 2010 00:16, DM Smith dm-sm...@woh.rr.com wrote:

 Should Android be a supported client OS?
 If so, wouldn't it be better not to have OS specific if-then-else and use
 reflection or something else?


Well Lucene is only relying on standard JVM API. The fact that Androïd is
using a non-standard JVM is IMHO outside the the scope of Lucene.

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document  Learning team - LITIS Laboratory

RE: Exception in migrating from 2.9.x to 3.0.2 on Android

2010-12-05 Thread Uwe Schindler

Hi DM,

In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does
not need the process ID anymore, so java.lang.management package is no
longer used.

In general, Lucene Java is compatible to the Java 5 SE specification.
Android uses Harmony and therefore we cannot guarantee compatibility as
Harmony is not TCK tested (but we do with latest versions, soon there will
also be tests on Hudson with Harmony). But only latest versions of Harmony
are really compatible with Lucene, previous versions fail lots of tests (ask
Robert), and Android phones use very antique versions of Harmony - it is not
even sure, that the Java5 Memory Model is correctly implemented in Dalvik!

About 3.0.2: Of course this version even works with latest Harmony, so
Harmony has java.lang.management package (which is java.lang!!!), so the bug
is in Android, simply by excluding a SE package. So you should open bug
report at Google and then hope that they fix it and all the phone
manufacturers like Motor-Roller will update their Android versions.

For your problem: The easy workaround is using Lucene 3.0.3 or simply use
another LockFactory (Andoid is single user so even NoLockFactory would be
fine in most cases). This are the same limitations like with the NFS
filesystem. Just use FSDir.open(dir, lockFactory).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: DM Smith [mailto:dm-sm...@woh.rr.com]
 Sent: Sunday, December 05, 2010 12:16 AM
 To: dev@lucene.apache.org
 Subject: Exception in migrating from 2.9.x to 3.0.2 on Android
 
 The current code that works on Android with 2.9.1, but fails with 3.0.2:
 
 Directory dir = FSDirectory.open(file);
 ...
 do something with directory
 ...
 
 The error we're seeing is:
 12-04 21:34:41.629: WARN/System.err(23160):
 java.lang.NoClassDefFoundError:
 java.lang.management.ManagementFactory
 12-04 21:34:41.639: WARN/System.err(23160): at
 org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock
 Factory.java:87)
 12-04 21:34:41.639: WARN/System.err(23160): at
 org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
 y.java:142)
 12-04 21:34:41.649: WARN/System.err(23160): at
 org.apache.lucene.store.Directory.makeLock(Directory.java:106)
 12-04 21:34:41.649: WARN/System.err(23160): at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058)
 
 Turns out Android does not have
 java.lang.management.ManagementFactory.
 
 There are several work arounds in client code, but not sure what is best.
 
 The bigger question is whether and how Lucene should be modified to
 accommodate?
 
 Ultimately FSDirectory.open does the following:
 if (Constants.WINDOWS) {
   return new SimpleFSDirectory(path, lockFactory);
 } else {
   return new NIOFSDirectory(path, lockFactory);
 }
 
 Should Android be a supported client OS?
 
 If so, wouldn't it be better not to have OS specific if-then-else and use
 reflection or something else?
 
 Thanks,
   DM
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 2218 - Failure

2010-12-05 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2218/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:466)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




Build Log (for compile errors):
[...truncated 8716 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2798) Randomize indexed collation key testing


[ 
https://issues.apache.org/jira/browse/LUCENE-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966933#action_12966933
 ] 

Robert Muir commented on LUCENE-2798:
-

Steven, before working too hard on the jdk collation tests, i just had this 
idea:

Are we sure we shouldn't deprecate the jdk collation functionality (remove in 
trunk) and only offer ICU?

I was just thinking that the JDK Collator integration is basically a RAM trap 
due to its aweful keysize, etc:
http://site.icu-project.org/charts/collation-icu4j-sun



 Randomize indexed collation key testing
 ---

 Key: LUCENE-2798
 URL: https://issues.apache.org/jira/browse/LUCENE-2798
 Project: Lucene - Java
  Issue Type: Test
  Components: Analysis
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0


 Robert Muir noted on #lucene IRC channel today that Lucene's indexed 
 collation key testing is currently fragile (for example, they had to be 
 revisited when Robert upgraded the ICU dependency in LUCENE-2797 because of 
 Unicode 6.0 collation changes) and coverage is trivial (only 5 locales 
 tested, and no collator options are exercised).  This affects both the JDK 
 implementation in {{modules/analysis/common/}} and the ICU implementation 
 under {{modules/icu/}}.
 The key thing to test is that the order of the indexed terms is the same as 
 that provided by the Collator itself.  Instead of the current set of static 
 tests, this could be achieved via indexing randomly generated terms' 
 collation keys (and collator options) and then comparing the index terms' 
 order to the order provided by the Collator over the original terms.
 Since different terms may produce the same collation key, however, the order 
 of indexed terms is inherently unstable.  When performing runtime collation, 
 the Collator addresses the sort stability issue by adding a secondary sort 
 over the normalized original terms.  In order to directly compare Collator's 
 sort with Lucene's collation key sort, a secondary sort will need to be 
 applied to Lucene's indexed terms as well. Robert has suggested indexing the 
 original terms in addition to their collation keys, then using a Sort over 
 the original terms as the secondary sort.
 Another complication: Lucene 3.X uses Java's UTF-16 term comparison, and 
 trunk uses UTF-8 order, so the implemented secondary sort will need to 
 respect that.
 From #lucene:
 {quote}
 rmuir__: so i think we have to on 3.x, sort the 'expected list' with 
 Collator.compare, if thats equal, then as a tiebreak use String.compareTo
 rmuir__: and in the index sort on the collated field, followed by the 
 original term
 rmuir__: in 4.x we do the same thing, but dont use String.compareTo as the 
 tiebreak for the expected list
 rmuir__: instead compare codepoints (iterating character.codepointAt, or 
 comparing .getBytes(UTF-8))
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2763) Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966943#action_12966943
 ] 

Robert Muir commented on LUCENE-2763:
-

+1, looks good to me.


 Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer
 ---

 Key: LUCENE-2763
 URL: https://issues.apache.org/jira/browse/LUCENE-2763
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2763.patch


 Currently, in addition to implementing the UAX#29 word boundary rules, 
 StandardTokenizer recognizes email adresses and URLs, but doesn't provide a 
 way to turn this behavior off and/or provide overlapping tokens with the 
 components (username from email address, hostname from URL, etc.).
 UAX29Tokenizer should become StandardTokenizer, and current StandardTokenizer 
 should be renamed to something like UAX29TokenizerPlusPlus (or something like 
 that).
 For rationale, see [the discussion at the reopened 
 LUCENE-2167|https://issues.apache.org/jira/browse/LUCENE-2167?focusedCommentId=12929325page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12929325].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene-Solr-tests-only-trunk - Build # 2221 - Failure

2010-12-05 Thread Yonik Seeley

Well, darn upgrading jetty didn't seem to help this.

-Yonik
http://www.lucidimagination.com



On Sun, Dec 5, 2010 at 7:05 AM, Apache Hudson Server
hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2221/

 1 tests failed.
 REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:466)
        at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92)
        at 
 org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




 Build Log (for compile errors):
 [...truncated 8716 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene-Solr-tests-only-trunk - Build # 2221 - Failure

On Sun, Dec 5, 2010 at 9:00 AM, Yonik Seeley yo...@lucidimagination.com wrote:
 Well, darn upgrading jetty didn't seem to help this.


I was getting really hopeful for a while!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene-Solr-tests-only-trunk - Build # 2211 - Failure

On Sun, Dec 5, 2010 at 1:46 AM, Apache Hudson Server
hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2211/

 1 tests failed.
 REGRESSION:  org.apache.solr.update.AutoCommitTest.testMaxTime


There's still a timing issue in this test I think. I modified it a
while ago to make it better but Hoss mentioned on the mailing list
some way we could change it to not be fragile...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (SOLR-1979) Create LanguageIdentifierUpdateProcessor


 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1979:
-

Assignee: Grant Ingersoll

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2244) Add Language Identification support


 [ 
https://issues.apache.org/jira/browse/SOLR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2244.
---

Resolution: Won't Fix

Actually, I'm going to switch back to SOLR-1979, as it is a superset of this 
patch.  I should have a patch up shortly.

 Add Language Identification support
 ---

 Key: SOLR-2244
 URL: https://issues.apache.org/jira/browse/SOLR-2244
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Attachments: solr2244.patch


 For starters, Tika has language identification capabilities that we can 
 likely leverage, but moreover, make it easier for people to plug in language 
 identification into the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966955#action_12966955
]

Grant Ingersoll commented on SOLR-1979:
---

See http://wiki.apache.org/solr/LanguageDetection for the start of
documentation.

bq. isReasonablyCertain() always returns false

See TIKA-568.

Create LanguageIdentifierUpdateProcessor

Key: SOLR-1979
URL: https://issues.apache.org/jira/browse/SOLR-1979
Project: Solr
Issue Type: New Feature
Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
Attachments: SOLR-1979.patch

We need the ability to detect language of some random text in order to act
upon it, such as indexing the content into language aware fields. Another
usecase is to be able to filter/facet on language on random unstructured
content.
To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The
processor is configurable like this:
{code:xml}
processor
class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
str name=inputFieldsname,subject/str
str name=outputFieldlanguage_s/str
str name=idFieldid/str
str name=fallbacken/str
/processor
{code}
It will then read the text from inputFields name and subject, perform
language identification and output the ISO code for the detected language in
the outputField. If no language was detected, fallback language is used.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966963#action_12966963
 ] 

Robert Muir commented on LUCENE-2793:
-

There is another problem we should solve here, and that is the buffersize 
problem.

This is totally broken at the moment for custom directories, here's an example.
I wanted to set the buffersize by default to 4096 (since i measured this is 
like a 20% improvement for my directory impl).

looking at the apis you would think that you simply override the openInput that 
takes no buffer size like this:
{noformat}
  @Override
  public IndexInput openInput(String name) throws IOException {
return openInput(name, 4096);
  }
{noformat}

unfortunately this doesnt work at all! instead you have to do something like 
this for it to actually work:
{noformat}
   @Override
   public IndexInput openInput(String name, int bufferSize) throws IOException {
  ensureOpen();
  return new IndexInput(name, Math.max(bufferSize, 4096));
   }
{noformat}

The problem is, throughout lucene's APIs, the directory's default is never 
used, instead the static BufferedIndexInput.BUFFER_SIZE is used everywhere... 
eg SegmentReader.get:

{noformat}
  public static SegmentReader get(boolean readOnly, SegmentInfo si, int 
termInfosIndexDivisor) throws CorruptIndexException, IOException {
return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, 
termInfosIndexDivisor);
  }
{noformat}

So I think lucene's apis should never specify buffersize, we should remove it 
completely from the codecs api, and it should be *replaced* with IOContext.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-05 Thread JIRA

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966964#action_12966964
]

Jan Høydahl commented on SOLR-1979:
---

Simply allowing to set the threshold for isReasonablyCertain() is probably not
enough to get a robust detection. This is because the distance measure is very
sensitive to the length of the profiles in use. Thus, it is a bit dangerous to
expose getDistance() as in TIKA-568, cause that distance measure is kind of an
internal value, not very normalized and is bound to change in future versions
of TIKA.

See TIKA-369 and TIKA-496.

I think the right way to go is solving these two issues first. By fixing so
that getDisance() is not biased towards profile length, we can make a new
isReasonablyCertain() implementation taking into account the relative distance
between first and second candidate languages...

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-05 Thread JIRA

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966970#action_12966970
]

Jan Høydahl commented on SOLR-1979:
---

The idField input parameter is just used for decent logging if detection fails.
It would be more elegant to get the id field name automatically through
SolrCore...

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2158) TestDistributedSearch.testDistribSearch fails often

[
https://issues.apache.org/jira/browse/SOLR-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966971#action_12966971
]

Yonik Seeley commented on SOLR-2158:

OK, so we upgraded jetty... but the failed to respond exception still happens.
Just to try and narrow things down, I put a long sleep inside solr request
handling and then tried a distributed search... it worked fine. So
it doesn't appear to be something getting hung up in Solr.

- a jetty bug
- an embedded jetty bug
- a HttpClient bug
- a bug in the way solr uses HttpClient

Another data point: with my load testing tool, I can run millions of requests
against Jetty/Solr (and I just did again). It doesn't use HttpClient though,
and it uses GET instead of POST.

Some things to try:
- Modify the load tool to use POST and verify things still work
- Put a long pause in TestDistributedSearch after the solr servers are brought
up, and then try load testing against those servers w/ an external tool.
- if this fails, we know it's an issue with how we embed Jetty
- Make a load testing tool that uses SolrJ exactly the way that distributed
search uses it, and try it on a normal Solr server
- if this fails, ti could be an HttpClient bug, or a jetty bug tickled by
HttpClient specifically
- if this fails, make a small self-contained load tool that uses only
HttpClient to remove the possibility of SolrJ bugs

TestDistributedSearch.testDistribSearch fails often
---

Key: SOLR-2158
URL: https://issues.apache.org/jira/browse/SOLR-2158
Project: Solr
Issue Type: Bug
Components: Build
Affects Versions: 3.1, 4.0
Environment: Hudson
Reporter: Robert Muir
Fix For: 3.1, 4.0

Attachments: TEST-org.apache.solr.TestDistributedSearch.txt

TestDistributedSearch.testDistribSearch fails often in hudson, with some
threads throwing uncaught exceptions.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966972#action_12966972
]

Robert Muir commented on SOLR-1979:
---

bq. cause that distance measure is kind of an internal value, not very
normalized and is bound to change in future versions of TIKA.

bq. we can make a new isReasonablyCertain() implementation taking into account
the relative distance between first and second candidate languages...

I don't follow the logic: if its not very normalized then it seems like this
approach doesnt tell you anything... language 1 could be uncertain,
and language 2 is just completely uncertain, but that tells you nothing: isn't
it like trying to determine if a good lucene search result score is certainly
a hit and not really the right way to go?

For example: consider the case where the language isn't supported at all by
Tika (i dont see a list of supported languages anywhere by the way!).
It would be good for us to know that the detection is uncertain at all... how
relatively uncertain it is with regards to the next language, is not very
important.

I think its also important we be able to get this uncertainty or whatever
different agnostic of the implementation.
For example, we should be able to somehow think of chaining detectors...

Its really important to cheat and not use heuristics for languages that don't
need them.
For example, disregarding some strange theoretical/historical cases, you can
simply look at the unicode properties
in the document to determine that its in the Greek language, as its basically
the only modern language using the greek alphabet

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-2158) TestDistributedSearch.testDistribSearch fails often


[ 
https://issues.apache.org/jira/browse/SOLR-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966718#action_12966718
 ] 

Yonik Seeley edited comment on SOLR-2158 at 12/5/10 10:38 AM:
--

Moving Robert's stack trace from the description to the comments.

{code}
[junit] Testsuite: org.apache.solr.TestDistributedSearch
[junit] Testcase: testDistribSearch(org.apache.solr.TestDistributedSearch): 
FAILED
[junit] Some threads threw uncaught exceptions!
[junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
exceptions!
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:416)
[junit] at 
org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:76)
[junit] at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)
[junit] 
[junit] 
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 382.297 sec
[junit] 
[junit] - Standard Error -
[junit] 2010. 10. 15 ?? 2:08:04 org.apache.solr.common.SolrException log
[junit] ??: org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request
[junit] at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:318)
[junit] at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
[junit] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325)
[junit] at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
[junit] at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
[junit] at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
[junit] at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
[junit] at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
[junit] at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
[junit] at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
[junit] at org.mortbay.jetty.Server.handle(Server.java:326)
[junit] at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
[junit] at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
[junit] at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
[junit] at 
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
[junit] at 
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
[junit] at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
[junit] at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
[junit] Caused by: org.apache.solr.client.solrj.SolrServerException: No 
live SolrServers available to handle this request
[junit] at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:297)
[junit] at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:513)
[junit] at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:478)
[junit] at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
[junit] at java.util.concurrent.FutureTask.run(FutureTask.java:166)
[junit] at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[junit] at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
[junit] at java.util.concurrent.FutureTask.run(FutureTask.java:166)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[junit] at java.lang.Thread.run(Thread.java:636)
[junit] Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.net.ConnectException: Operation timed out
[junit] at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
[junit] at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
[junit] at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:274)
[junit] ... 10 more
[junit] Caused by: java.net.ConnectException: Operation timed out
[junit] at

[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-1979:
--

Attachment: SOLR-1979.patch

I took Jan's and Tommaso's patches and reworked them a bit. It seems to me
that there isn't much point in merely identifying the language if you aren't
going to do something about it. So, this patch builds on what Jan and Tommaso
did and then will remap the input fields to new per language fields (note, we
could make this optional). I also tried to standardize the input parameters a
bit. I dropped the outputField setting and a number of other settings and I
made the language detection to be per input field. The basic gist of it is
that if you input two fields: name, subject, it will detect the language of
each field and then attempt to map them to a new field. The new field is made
by concatenating the original field name with _ + the ISO 639 code. For
example, if en is the detected language, then the new field for name would be
name_en. If that field doesn't exist, it will fall back to the original field
(i.e. name).

Left to do:
# Fix the tests. I don't like how we currently tests UpdateProcessorChains.
It should not require writing your own little piece of update mechanism. You
should be able to simply setup the appropriate configuration, hook it into an
update handler and then hit that update handler.
# Need to check the license headers, builds, etc.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor


[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966978#action_12966978
 ] 

Robert Muir commented on SOLR-1979:
---

We really need to not be using ISO 639-1 here. 

For example,
Its not expressive enough, not differentiating between Simplified and 
Traditional chinese, yet SmartChineseAnalyzer only works on Simplified.

I would like to see RFC 3066 instead

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Changes Mess

2010-12-05 Thread Mattmann, Chris A (388J)

Hi Mark,

RE: the credit system. JIRA provides a contribution report here, like this one
that I generated for Lucene 3.1:

http://s.apache.org/BpL

Just click on Reports Contribution Report in the upper right of JIRA on the
main project summary page.

We've been using this in Tika since the beginning to indicate contributions
from folks and it's worked well.

Cheers,
Chris

On Dec 4, 2010, at 10:03 PM, Mark Miller wrote:

I like this idea myself - it would encourage better JIRA summaries and
reduce duplication.

It's easy to keep a mix of old and new too - keep the things that Grant
mentions in CHANGES.txt (back compat migration, misc info), but you can
also just export a text Changes from JIRA at release and add that (along
with a link). Certainly nice to have a 'hard' copy.

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12315147styleName=TextprojectId=12310110Create=Create

The only thing I don't like is the loss of the current credit system - I
like that better than the crawl through JIRA method. I think prominent
credits are a good encouragement for new contributors.

Any comments on that?

- Mark

On 12/2/10 11:46 AM, Grant Ingersoll wrote:
I think we should drop the item by item change list and instead focus on 3
things:
1. Prose describing the new features (see Tika's changes file for instance)
and things users should pay special attention to such as when they might
need to re-index.
2. Calling out explicit compatibility breaks
3. A Pointer to full list of changes in JIRA. Alternatively, I believe
there is a way in JIRA to export/generate a summary of all issues fixed.

#1 can be done right before release simply by going through #3 and doing the
appropriate wordsmithing. #2 should be tracked as it is found.

It's kind of silly that we have all this duplication of effort built in, not
too mention having to track it across two branches.

We do this over in Mahout and I think it works pretty well and reduces the
duplication quite a bit since everything is already in JIRA and JIRA
produces nice summaries too. It also encourages people to track things
better in JIRA. #1 above also lends itself well as the basis of press
releases/blogs/etc.

-Grant

On Dec 1, 2010, at 11:54 AM, Michael McCandless wrote:

So, going forward...

When committing an issue that needs a changes entry, where are we
supposed to put it?

EG if it's a bug fix that we'll backport all the way to 2.9.x... where
does it go?

If it's a new feature/API that's going to 3.x and trunk... only in
3.x's CHANGES?

Mike

On Wed, Dec 1, 2010 at 9:22 AM, Uwe Schindleru...@thetaphi.de wrote:
Hi all,

when merging changes done in 2.9.4/3.0.3 with current 3.x and trunk I found
out that 3.x changes differ immense between the trunk changes.txt and the
3.x changes.txt. Some entries are missing in the 3.x branch, but are
available in trunk's 3.x part or other entries using new trunk class names
are between 3.x changes in trunk.

I copied over the 3.x branch CHANGES.txt over trunks 3.x section and
attached a patch of this. What should we do? Its messy :( Most parts seem
to
be merge failures. We should go through all those diff'ed issues and check
where they were really fixed (3.x or trunk) and move the entries
accordingly. After that in the 3.x branch and in trunk's 3.x section of
CHANGES.txt should be identical text!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Grant Ingersoll
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++

Re: Changes Mess

On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Mark,

 RE: the credit system. JIRA provides a contribution report here, like this 
 one that I generated for Lucene 3.1:


My concern with this is that it leaves out important email contributors.

For example if a user reports a bug, we typically include their name
in CHANGES.txt
The user who reports the bug does the hard work of finding that
there is a bug and reporting it to us.
Additionally sometimes they do extra stuff, boiling the problem down
to a certain piece of code, into a test case, etc, even if they don't
know how to fix the bug.
Then again, maybe they are a solr user who doesn't even know the java
programming language but finds a nasty bug in lucene.

In all cases I think if a user finds a bug and we fix it, its
important we credit them as we should encourage people to find bugs :)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2266) java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord()

2010-12-05 Thread Peter Wolanin (JIRA)

java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate 
field in a boost function with rord()


 Key: SOLR-2266
 URL: https://issues.apache.org/jira/browse/SOLR-2266
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1
 Environment: Mac OS 10.6
java version 1.6.0_22
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)

Reporter: Peter Wolanin



I have been testing a switch to long and tdate instead of int and date fields 
in the schema.xml for our Drupal integration.  This indexes fine, but search 
fails with a 500 error.

{code}
INFO: [d7] webapp=/solr path=/select 
params={spellcheck=truefacet=truefacet.mincount=1indent=1spellcheck.q=termjson.nl=mapwt=jsonrows=10version=1.2fl=id,entity_id,entity,bundle,bundle_name,nid,title,comment_count,type,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=termbf=recip(rord(created),4,19,19)^200.0}
 status=500 QTime=4 
Dec 5, 2010 11:52:28 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 39
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
at 
org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:61)
at 
org.apache.solr.search.function.TopValueSource.getValues(TopValueSource.java:57)
at 
org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61)
at 
org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123)
at 
org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
{code}

The exception goes away if I remove the boost function param 
bf=recip(rord(created),4,19,19)^200.0

Omitting the recip() doesn't help, so just bf=rord(created)^200.0 still causes 
the exception.

In

[jira] Resolved: (LUCENE-1541) Trie range - make trie range indexing more flexible

2010-12-05 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1541.
---

Resolution: Won't Fix

I don't think a fix is needed anymore.

 Trie range - make trie range indexing more flexible
 ---

 Key: LUCENE-1541
 URL: https://issues.apache.org/jira/browse/LUCENE-1541
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Ning Li
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1541.patch, LUCENE-1541.patch


 In the current trie range implementation, a single precision step is 
 specified. With a large precision step (say 8), a value is indexed in fewer 
 terms (8) but the number of terms for a range can be large. With a small 
 precision step (say 2), the number of terms for a range is smaller but a 
 value is indexed in more terms (32).
 We want to add an option that different precision steps can be set for 
 different precisions. An expert can use this option to keep the number of 
 terms for a range small and at the same time index a value in a small number 
 of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Path to jquery?

2010-12-05 Thread Eric Pugh

You are quite right.  I put a bug into JIRA, basically the layout.vm was 
referring to a older version of jquery then what was in the Solr.war file!  I 
do think though that having everything all in the /velocity directory would 
make it easier for someone who is new to Solr to grok how to customize the 
/browse interface!  Most folks do NOT want to be adding/hacking files in the 
solr.war, they just want to use what is distributed!

Eric



On Dec 2, 2010, at 4:45 PM, Ryan McKinley wrote:

 jquery is actually in the .war file, so you read it directly from the server.
 
 The file?file=/velocity... request streams content from inside your
 solr configuration directory
 
 
 
 On Thu, Dec 2, 2010 at 10:35 AM, Eric Pugh
 ep...@opensourceconnections.com wrote:
 Hi all,
 
 Looking at Solr 3.x, it seems like that path to jquery fails if you are 
 using multicore.
 
 In layout.vm there is:
 
 script type=text/javascript 
 src=#{url_for_solr}/admin/jquery-1.2.3.min.js/script
 
 However, for other files it is specified via:
 
  script type=text/javascript 
 src=#{url_for_solr}/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript/script
 
 
 Thinking that the URL for jquery should be the same as the other 
 jquery.autocomplete.js, and packaged in the /velocity directory as well???
 
 Eric
 
 
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
 http://www.opensourceconnections.com
 Co-Author: Solr 1.4 Enterprise Search Server available from 
 http://www.packtpub.com/solr-1-4-enterprise-search-server
 Free/Busy: http://tinyurl.com/eric-cal
 
 
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Changes Mess

2010-12-05 Thread Steven A Rowe

On 12/5/2010 at 12:19 PM, Robert Muir wrote:
 On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J)
 chris.a.mattm...@jpl.nasa.gov wrote:
  Hi Mark,
 
  RE: the credit system. JIRA provides a contribution report here, like
  this one that I generated for Lucene 3.1:
 
 
 My concern with this is that it leaves out important email contributors.

I agree, this is a serious problem.

My additional problems with JIRA-generated changes:

1. Huge undifferentiated change lists are frightening and nearly useless, 
regardless of the quality of the descriptions.

JIRA's issue types are:
 
Bug, New Feature, Improvement, Test, Wish, Task

Even if we used JIRA's issue types to group issues, they
are not the same as Lucene's CHANGES.txt issue types:

Changes in backwards compatibility policy, 
Changes in runtime behavior, 
API Changes, Documentation, Bug fixes, New features,
Optimizations, Build, Test Cases, Infrastructure

(I left out Requirements, last used in 2006 under release
1.9 RC1, since Build seems to have replaced it.)

2. There are now four separate CHANGES.txt files in the Lucene code base, 
excluding Solr and its modules (each of which has one of them).  This number 
will only grow as more Lucene contribs become modules.

The JIRA project components list is outdated / incomplete
/ has different granularity than the CHANGES.txt locations,
so using it to group JIRA issues would not work because
they don't align with Lucene/Solr components.

3. Some of the CHANGES.txt entries draw from multiple JIRA issues.

From dev/trunk/lucene/CHANGES.txt:

Trunk: 9 out of 56 include multiple JIRA issues
3.X: 7/94
3.0.0: 3/29
2.9.0: 9/153

I'm assuming a JIRA dump can't do this.

4. Some JIRA issues appear under multiple change categories in CHANGES.txt.

From dev/trunk/lucene/CHANGES.txt:

Trunk: 3 out of 68 multiply categorized
3.X: 9/102
3.0.0: 1/53
2.9.0: 20/166

A JIRA dump would not allow for multiple issue 
categorization, since JIRA only allows a single issue
type to be assigned - I guess they are assumed to be
mutually exclusive.


Maybe our use of JIRA could be changed to address some of these problems, 
through addition of new fields and/or modification of existing fields' 
allowable values?

Steve

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967010#action_12967010
]

Grant Ingersoll commented on SOLR-1979:
---

bq. I would like to see RFC 3066 instead

Yeah, that makes sense, however, I believe Tika returns 639. (Tika doesn't
recognize Chinese yet at all). One approach is we could normalize, I suppose.
Another is to fix Tika. I'd really like to see Tika support more languages,
too.

Longer term, I'd like to not do the fieldName_LangCode thing at all and instead
let the user supply a string that could have variable substitution if they
want, something like fieldName_${langCode}, or it could be
${langCode}_fieldName or it could just be another literal.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor


[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967011#action_12967011
 ] 

Grant Ingersoll commented on SOLR-1979:
---

Another thought, here, is that, over time, this class becomes a base class and 
it becomes easy to replace the language detection piece, that way one gets all 
the infrastructure of this class, but can plugin their own detection.  In fact, 
I'm going to do that right now.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Monitoring the UI's mem usage

This shouldn't normally be something that you need to do with jruby I'd 
think - but Avram asked about this on the call back when there where ui 
running out of memory issues.


Since we require java 6, this is actually really easy.

Java itself comes with jconsole. It should be on your path. You just 
start it, and it lists running java processes that you can connect to. 
Choose the one with jruby-complete-1.5.3.jar in the name for the UI. The 
back end is the one with start.jar in the name.


I usually prefer visualvm over jconsole (kind of a supe'd up version of 
jconsole with a mem/cpu profiler). Its free and simple to use at 
https://visualvm.dev.java.net/.


That makes it very easy to see how the UI and back end are using memory, 
their garbage collection activity, cpu usage, etc.


I often run one on my laptop screen as I test LWE.

- Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Testing UpdateProcessorChain

2010-12-05 Thread Grant Ingersoll

Anyone have any thoughts on testing UpdateProcessorChain (and Factory).  In 
looking at the Signature (dedup) tests, it seems a little clunky, yet the Solr 
base test class adoc (and related methods) don't seem to support specifying the 
Update handler to hit.

Thoughts?

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Monitoring the UI's mem usage


Gotto love a wrong email address autocomplete.

On 12/5/10 3:26 PM, Mark Miller wrote:

This shouldn't normally be something that you need to do with jruby I'd
think - but Avram asked about this on the call back when there where ui
running out of memory issues.

Since we require java 6, this is actually really easy.

Java itself comes with jconsole. It should be on your path. You just
start it, and it lists running java processes that you can connect to.
Choose the one with jruby-complete-1.5.3.jar in the name for the UI. The
back end is the one with start.jar in the name.

I usually prefer visualvm over jconsole (kind of a supe'd up version of
jconsole with a mem/cpu profiler). Its free and simple to use at
https://visualvm.dev.java.net/.

That makes it very easy to see how the UI and back end are using memory,
their garbage collection activity, cpu usage, etc.

I often run one on my laptop screen as I test LWE.

- Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Testing UpdateProcessorChain

2010-12-05 Thread Yonik Seeley

On Sun, Dec 5, 2010 at 3:28 PM, Grant Ingersoll gsing...@apache.org wrote:
 Anyone have any thoughts on testing UpdateProcessorChain (and Factory).  In 
 looking at the Signature (dedup) tests, it seems a little clunky, yet the 
 Solr base test class adoc (and related methods) don't seem to support 
 specifying the Update handler to hit.

You can specify an alternate update processor with any update command.
SolrTestCaseJ4 has this:
  public static String add(XmlDoc doc, String... args) {

so... you should be able to do something like
add(doc(id,10),update.processor,foo)

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android

2010-12-05 Thread DM Smith

Thanks Uwe (and others). We'll adapt.

Is there any interest here in knowing if there are any other problems regarding 
Lucene on Android? From what I see, it is the first mobile platform on which 
Lucene can run.

-- DM

On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote:

 Hi DM,
 
 In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does
 not need the process ID anymore, so java.lang.management package is no
 longer used.
 
 In general, Lucene Java is compatible to the Java 5 SE specification.
 Android uses Harmony and therefore we cannot guarantee compatibility as
 Harmony is not TCK tested (but we do with latest versions, soon there will
 also be tests on Hudson with Harmony). But only latest versions of Harmony
 are really compatible with Lucene, previous versions fail lots of tests (ask
 Robert), and Android phones use very antique versions of Harmony - it is not
 even sure, that the Java5 Memory Model is correctly implemented in Dalvik!
 
 About 3.0.2: Of course this version even works with latest Harmony, so
 Harmony has java.lang.management package (which is java.lang!!!), so the bug
 is in Android, simply by excluding a SE package. So you should open bug
 report at Google and then hope that they fix it and all the phone
 manufacturers like Motor-Roller will update their Android versions.
 
 For your problem: The easy workaround is using Lucene 3.0.3 or simply use
 another LockFactory (Andoid is single user so even NoLockFactory would be
 fine in most cases). This are the same limitations like with the NFS
 filesystem. Just use FSDir.open(dir, lockFactory).
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 -Original Message-
 From: DM Smith [mailto:dm-sm...@woh.rr.com]
 Sent: Sunday, December 05, 2010 12:16 AM
 To: dev@lucene.apache.org
 Subject: Exception in migrating from 2.9.x to 3.0.2 on Android
 
 The current code that works on Android with 2.9.1, but fails with 3.0.2:
 
 Directory dir = FSDirectory.open(file);
 ...
 do something with directory
 ...
 
 The error we're seeing is:
 12-04 21:34:41.629: WARN/System.err(23160):
 java.lang.NoClassDefFoundError:
 java.lang.management.ManagementFactory
 12-04 21:34:41.639: WARN/System.err(23160): at
 org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock
 Factory.java:87)
 12-04 21:34:41.639: WARN/System.err(23160): at
 org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
 y.java:142)
 12-04 21:34:41.649: WARN/System.err(23160): at
 org.apache.lucene.store.Directory.makeLock(Directory.java:106)
 12-04 21:34:41.649: WARN/System.err(23160): at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058)
 
 Turns out Android does not have
 java.lang.management.ManagementFactory.
 
 There are several work arounds in client code, but not sure what is best.
 
 The bigger question is whether and how Lucene should be modified to
 accommodate?
 
 Ultimately FSDirectory.open does the following:
if (Constants.WINDOWS) {
  return new SimpleFSDirectory(path, lockFactory);
} else {
  return new NIOFSDirectory(path, lockFactory);
}
 
 Should Android be a supported client OS?
 
 If so, wouldn't it be better not to have OS specific if-then-else and use
 reflection or something else?
 
 Thanks,
  DM
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android

I have an interest - don't really care if it uses true java or not. I 
say keep it coming. Where/if it makes sense, why not make lucene work 
better with it. Perhaps that is not possible or too difficult in every 
case - but I'd still like to see the cases pop up. Better than those 
spam wiki update emails.


- Mark

On 12/5/10 3:36 PM, DM Smith wrote:

Thanks Uwe (and others). We'll adapt.

Is there any interest here in knowing if there are any other problems regarding 
Lucene on Android? From what I see, it is the first mobile platform on which 
Lucene can run.

-- DM

On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote:


Hi DM,

In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and does
not need the process ID anymore, so java.lang.management package is no
longer used.

In general, Lucene Java is compatible to the Java 5 SE specification.
Android uses Harmony and therefore we cannot guarantee compatibility as
Harmony is not TCK tested (but we do with latest versions, soon there will
also be tests on Hudson with Harmony). But only latest versions of Harmony
are really compatible with Lucene, previous versions fail lots of tests (ask
Robert), and Android phones use very antique versions of Harmony - it is not
even sure, that the Java5 Memory Model is correctly implemented in Dalvik!

About 3.0.2: Of course this version even works with latest Harmony, so
Harmony has java.lang.management package (which is java.lang!!!), so the bug
is in Android, simply by excluding a SE package. So you should open bug
report at Google and then hope that they fix it and all the phone
manufacturers like Motor-Roller will update their Android versions.

For your problem: The easy workaround is using Lucene 3.0.3 or simply use
another LockFactory (Andoid is single user so even NoLockFactory would be
fine in most cases). This are the same limitations like with the NFS
filesystem. Just use FSDir.open(dir, lockFactory).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-Original Message-
From: DM Smith [mailto:dm-sm...@woh.rr.com]
Sent: Sunday, December 05, 2010 12:16 AM
To: dev@lucene.apache.org
Subject: Exception in migrating from 2.9.x to 3.0.2 on Android

The current code that works on Android with 2.9.1, but fails with 3.0.2:

Directory dir = FSDirectory.open(file);
...
do something with directory
...

The error we're seeing is:
12-04 21:34:41.629: WARN/System.err(23160):
java.lang.NoClassDefFoundError:
java.lang.management.ManagementFactory
12-04 21:34:41.639: WARN/System.err(23160): at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock
Factory.java:87)
12-04 21:34:41.639: WARN/System.err(23160): at
org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
y.java:142)
12-04 21:34:41.649: WARN/System.err(23160): at
org.apache.lucene.store.Directory.makeLock(Directory.java:106)
12-04 21:34:41.649: WARN/System.err(23160): at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058)

Turns out Android does not have
java.lang.management.ManagementFactory.

There are several work arounds in client code, but not sure what is best.

The bigger question is whether and how Lucene should be modified to
accommodate?

Ultimately FSDirectory.open does the following:
if (Constants.WINDOWS) {
  return new SimpleFSDirectory(path, lockFactory);
} else {
  return new NIOFSDirectory(path, lockFactory);
}

Should Android be a supported client OS?

If so, wouldn't it be better not to have OS specific if-then-else and use
reflection or something else?

Thanks,
DM
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967016#action_12967016
]

Yonik Seeley commented on SOLR-1979:

bq. The new field is made by concatenating the original field name with _ +
the ISO 639 code.

This could be problematic given a large set of language codes since they could
collide with existing dynamic field definitions.
Perhaps something with text in the name also?

Perhaps fieldName_${langCode}Text

Examples:
name_enText
name_frText

It would probably also be nice to be able to map a number of languages to a
single field say you have a single analyzer that can handle CJK, then you
may want that whole collection of languages mapped to a single _cjk field.

And just because you can detect a language doesn't mean you know how to handle
it differently... so also have an optional catchall that handles all languages
not specifically mapped.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967019#action_12967019
]

Robert Muir commented on SOLR-1979:
---

bq. Yeah, that makes sense, however, I believe Tika returns 639.

Right, but 639 is just a subset of 3066 etc.

So, ignore what tika does. its 639 identifiers are also valid 3066.

Our API should at least be 3066, Java7/ICU already support BCP47 locale
identifiers etc, so you get the normalization there for free.

{quote}
It would probably also be nice to be able to map a number of languages to a
single field say you have a single analyzer that can handle CJK, then you
may want that whole collection of languages mapped to a single _cjk field.

And just because you can detect a language doesn't mean you know how to handle
it differently... so also have an optional catchall that handles all languages
not specifically mapped.
{quote}

Both of these are good reasons why we must avoid 639-1.
We should be able to use things like macrolanguages and undetermined language.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2266) java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord()


[ 
https://issues.apache.org/jira/browse/SOLR-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967022#action_12967022
 ] 

Yonik Seeley commented on SOLR-2266:


OK, here's my guess: it's probably due to multiple indexed values per field 
value.  ord/rord uses the StringIndex to get the ord values, which can't handle 
multiple indexed tokens per field value.

The tdate type has a precisionStep  0, meaning it will index multiple values 
per field value to speed up range queries.
If you don't need faster range queries on this type, then use date instead of 
tdate.

But the ideal fix here is to eliminate the use of ord/rord since they also use 
up more memory... sorting by created will instantiate a per-segment long[] 
FieldCache entry.
It would be nice if that could be reused for the function queries too.  This is 
the case if you use ms().
http://wiki.apache.org/solr/FunctionQuery#ms

 java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate 
 field in a boost function with rord()
 

 Key: SOLR-2266
 URL: https://issues.apache.org/jira/browse/SOLR-2266
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1
 Environment: Mac OS 10.6
 java version 1.6.0_22
 Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
 Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
Reporter: Peter Wolanin

 I have been testing a switch to long and tdate instead of int and date fields 
 in the schema.xml for our Drupal integration.  This indexes fine, but search 
 fails with a 500 error.
 {code}
 INFO: [d7] webapp=/solr path=/select 
 params={spellcheck=truefacet=truefacet.mincount=1indent=1spellcheck.q=termjson.nl=mapwt=jsonrows=10version=1.2fl=id,entity_id,entity,bundle,bundle_name,nid,title,comment_count,type,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=termbf=recip(rord(created),4,19,19)^200.0}
  status=500 QTime=4 
 Dec 5, 2010 11:52:28 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: 39
 at 
 org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721)
 at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
 at 
 org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
 at 
 org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:61)
 at 
 org.apache.solr.search.function.TopValueSource.getValues(TopValueSource.java:57)
 at 
 org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61)
 at 
 org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123)
 at 
 org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93)
 at 
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
 at org.apache.lucene.search.Searcher.search(Searcher.java:171)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
 at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at

Re: Testing UpdateProcessorChain

2010-12-05 Thread Grant Ingersoll


On Dec 5, 2010, at 3:34 PM, Yonik Seeley wrote:

 On Sun, Dec 5, 2010 at 3:28 PM, Grant Ingersoll gsing...@apache.org wrote:
 Anyone have any thoughts on testing UpdateProcessorChain (and Factory).  In 
 looking at the Signature (dedup) tests, it seems a little clunky, yet the 
 Solr base test class adoc (and related methods) don't seem to support 
 specifying the Update handler to hit.
 
 You can specify an alternate update processor with any update command.
 SolrTestCaseJ4 has this:
  public static String add(XmlDoc doc, String... args) {
 
 so... you should be able to do something like
 add(doc(id,10),update.processor,foo)

Yeah, I am calling that.  I think the problem is that assertU() calls 
doLegacyUpdate, which doesn't handle getting the chain.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android

what I am saying, is that this is a java project, and I don't want to write
to some least common denominator/intersection of java and android. if an api
doesn't exist in android, I could care less. inzstead, why can't interest
parties have a little project where we port lucene java (perhaps a trivial
patch), setup automated/tests etc. this I would be interested in, but let's
keep lucene java as java
On Dec 5, 2010 9:40 PM, Mark Miller markrmil...@gmail.com wrote:
 I have an interest - don't really care if it uses true java or not. I
 say keep it coming. Where/if it makes sense, why not make lucene work
 better with it. Perhaps that is not possible or too difficult in every
 case - but I'd still like to see the cases pop up. Better than those
 spam wiki update emails.

 - Mark

 On 12/5/10 3:36 PM, DM Smith wrote:
 Thanks Uwe (and others). We'll adapt.

 Is there any interest here in knowing if there are any other problems
regarding Lucene on Android? From what I see, it is the first mobile
platform on which Lucene can run.

 -- DM

 On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote:

 Hi DM,

 In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and
does
 not need the process ID anymore, so java.lang.management package is no
 longer used.

 In general, Lucene Java is compatible to the Java 5 SE specification.
 Android uses Harmony and therefore we cannot guarantee compatibility as
 Harmony is not TCK tested (but we do with latest versions, soon there
will
 also be tests on Hudson with Harmony). But only latest versions of
Harmony
 are really compatible with Lucene, previous versions fail lots of tests
(ask
 Robert), and Android phones use very antique versions of Harmony - it is
not
 even sure, that the Java5 Memory Model is correctly implemented in
Dalvik!

 About 3.0.2: Of course this version even works with latest Harmony, so
 Harmony has java.lang.management package (which is java.lang!!!), so the
bug
 is in Android, simply by excluding a SE package. So you should open bug
 report at Google and then hope that they fix it and all the phone
 manufacturers like Motor-Roller will update their Android versions.

 For your problem: The easy workaround is using Lucene 3.0.3 or simply
use
 another LockFactory (Andoid is single user so even NoLockFactory would
be
 fine in most cases). This are the same limitations like with the NFS
 filesystem. Just use FSDir.open(dir, lockFactory).

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

 -Original Message-
 From: DM Smith [mailto:dm-sm...@woh.rr.com]
 Sent: Sunday, December 05, 2010 12:16 AM
 To: dev@lucene.apache.org
 Subject: Exception in migrating from 2.9.x to 3.0.2 on Android

 The current code that works on Android with 2.9.1, but fails with
3.0.2:

 Directory dir = FSDirectory.open(file);
 ...
 do something with directory
 ...

 The error we're seeing is:
 12-04 21:34:41.629: WARN/System.err(23160):
 java.lang.NoClassDefFoundError:
 java.lang.management.ManagementFactory
 12-04 21:34:41.639: WARN/System.err(23160): at

org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock
 Factory.java:87)
 12-04 21:34:41.639: WARN/System.err(23160): at
 org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
 y.java:142)
 12-04 21:34:41.649: WARN/System.err(23160): at
 org.apache.lucene.store.Directory.makeLock(Directory.java:106)
 12-04 21:34:41.649: WARN/System.err(23160): at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058)

 Turns out Android does not have
 java.lang.management.ManagementFactory.

 There are several work arounds in client code, but not sure what is
best.

 The bigger question is whether and how Lucene should be modified to
 accommodate?

 Ultimately FSDirectory.open does the following:
 if (Constants.WINDOWS) {
 return new SimpleFSDirectory(path, lockFactory);
 } else {
 return new NIOFSDirectory(path, lockFactory);
 }

 Should Android be a supported client OS?

 If so, wouldn't it be better not to have OS specific if-then-else and
use
 reflection or something else?

 Thanks,
 DM
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
additional
 commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-05 Thread JIRA

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967032#action_12967032
]

Jan Høydahl commented on SOLR-1979:
---

@Robert: Yes, there must be a way to tell whether or not the language even has
a profile, through some well defined method. It's not important HOW we improve
detection certainty, but comparing the top n distances could help. I'm also a
fan of including other metrics than profile similarity if that can help,
however for unique scripts that will automatically be covered by profile
similarity. Detailed solution discussions should continue in TIKA-369.

Macro languages: See TIKA-493

It makes sense to allow for detecting languages outside 639-1, and I believe
RFC3066 and BCP47 are both re-using the 639 codes, so that if there is a
2-letter code for a language it will be used. 639-1 is what everyone already
knows.

In general, improvements should be done in Tika space, then use those in Solr,
thus building one strong language detection library.

@Grant: I actually planned to do the regEx based field name mapping in a
separate UpdateProcessor, to make things more flexible. Example:
{code:xml}
processor
class=org.apache.solr.update.processor.LanguageFieldMapperUpdateProcessor
str name=languageFieldlanguage/str
str name=fromRegEx(.*?)_lang/str
str name=toRegEx$1_$lang/str
str name=notSupportedLanguageToRegEx$1_t/str
str name=supportedLanguagesde,en,fr,it,es,nl/str
/processor
{code}

Your thought of allowing to detect language for individual fields in one go is
also interesting. I'd love to see metadata support in SolrInputDocument, so
that one processor could annotate a @language on the fields analyzed. Then next
processor could act on metadata to rename field...

@Yonik: By allowing regex naming of field names, we give users a generic tool
to avoid field name clashes, by picking the pattern.. Mapping multiple
languages to same suffix also makes sense.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1048) Ids parameter and fl=score throws an exception for wt=json

2010-12-05 Thread Jon Bodner (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967035#action_12967035
 ] 

Jon Bodner commented on SOLR-1048:
--

The issue is still present in the 1.4.1 code base for Solr.  I found the source 
of the problem.  In the ids stage for sharding, the score is not calculated (it 
was returned in the previous stage), so the DocSlice's scores float array is 
still null.  XMLWriter and BinaryResponseWriter include lines like:

includeScore = includeScore  ids.hasScores();

but JSONWriter does not. 

This issue is only going to present itself when you are debugging, since I 
think the ids parameter is only used for sharding, and Solr uses the javabin 
wire protocol instead of json.


 Ids parameter and fl=score throws an exception for wt=json
 --

 Key: SOLR-1048
 URL: https://issues.apache.org/jira/browse/SOLR-1048
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Laurent Chavet

 http://yourHost:8080/solr/select/?ids=YourDocIdversion=2.2start=0rows=10indent=onfl=score,idq=%2B*:*
 shows that when using ids= the score for docs is null; when using wt=json:
 http://yourHost:8080/solr/select/?ids=YourDocIdversion=2.2start=0rows=10indent=onfl=score,idq=%2B*:*wt=json
 that throws a NullPointerException:
 HTTP Status 500 - null java.lang.NullPointerException at 
 org.apache.solr.search.DocSlice$1.score(DocSlice.java:120) at 
 org.apache.solr.request.JSONWriter.writeDocList(JSONResponseWriter.java:490) 
 at 
 org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:140)
  at 
 org.apache.solr.request.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:175)
  at 
 org.apache.solr.request.JSONWriter.writeNamedList(JSONResponseWriter.java:288)
  at 
 org.apache.solr.request.JSONWriter.writeResponse(JSONResponseWriter.java:88) 
 at 
 org.apache.solr.request.JSONResponseWriter.write(JSONResponseWriter.java:49) 
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:847) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) 
 at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android


On 12/5/10 5:05 PM, Robert Muir wrote:

what I am saying, is that this is a java project, and I don't want to
write to some least common denominator/intersection of java and android.


So don't - DM submitting cases that don't work and you not giving a shit 
are not mutually exclusive.


- Mark


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

FieldCache usage for custom field collapse in solr 1.4

2010-12-05 Thread Adam H.

Hey,
I'm trying to use the lucene FieldCache for some custom field collapsing
implementation: basically i'm collapsing on a non-stored field,
and so am using the fieldcache to retrieve field value instances during run.

I noticed I'm getting some OOM's after deploying it, and after looking into
it for abit, figured that it might be to do with using a call like this:

StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
collapseField);

where 'reader' is the instance of the SolrIndexReader passed along to the
component with the ResponseBuilder.SolrQueryRequest object.

As I understand, this can double memory usage due to (re)loading this
fieldcache on a reader-wide basis rather than on a per segment basis?
If so, what would be a way to migrate this code to use a per segment cache?
i'm not sure I understand the semantics there at all...

Any help will be greatly appreciated, thanks alot!

Adam

Re: Exception in migrating from 2.9.x to 3.0.2 on Android

On Sun, Dec 5, 2010 at 6:10 PM, Mark Miller markrmil...@gmail.com wrote:
 On 12/5/10 5:05 PM, Robert Muir wrote:

 what I am saying, is that this is a java project, and I don't want to
 write to some least common denominator/intersection of java and android.

 So don't - DM submitting cases that don't work and you not giving a shit are
 not mutually exclusive.


Just trying to say, i dont think we should change the programming
language of the project without a proper vote.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android


bq.  Perhaps that is not possible or too difficult in every case

To clarify - sounds like I could be saying, well, perhaps we can't 
improve every case, but some we can. I'm saying *too difficult in every* 
case - even if we don't try and *fix a single case* - its still 
beneficial for you to report and discuss these issues IMO. And as I 
said, I'll remain interested.



- Mark

On 12/5/10 3:40 PM, Mark Miller wrote:

I have an interest - don't really care if it uses true java or not. I
say keep it coming. Where/if it makes sense, why not make lucene work
better with it. Perhaps that is not possible or too difficult in every
case - but I'd still like to see the cases pop up. Better than those
spam wiki update emails.

- Mark

On 12/5/10 3:36 PM, DM Smith wrote:

Thanks Uwe (and others). We'll adapt.

Is there any interest here in knowing if there are any other problems
regarding Lucene on Android? From what I see, it is the first mobile
platform on which Lucene can run.

-- DM

On Dec 5, 2010, at 5:16 AM, Uwe Schindler wrote:


Hi DM,

In Lucene 3.0.3, NativeFSLockFactory no longer aquires a test log and
does
not need the process ID anymore, so java.lang.management package is no
longer used.

In general, Lucene Java is compatible to the Java 5 SE specification.
Android uses Harmony and therefore we cannot guarantee compatibility as
Harmony is not TCK tested (but we do with latest versions, soon there
will
also be tests on Hudson with Harmony). But only latest versions of
Harmony
are really compatible with Lucene, previous versions fail lots of
tests (ask
Robert), and Android phones use very antique versions of Harmony - it
is not
even sure, that the Java5 Memory Model is correctly implemented in
Dalvik!

About 3.0.2: Of course this version even works with latest Harmony, so
Harmony has java.lang.management package (which is java.lang!!!), so
the bug
is in Android, simply by excluding a SE package. So you should open bug
report at Google and then hope that they fix it and all the phone
manufacturers like Motor-Roller will update their Android versions.

For your problem: The easy workaround is using Lucene 3.0.3 or simply
use
another LockFactory (Andoid is single user so even NoLockFactory
would be
fine in most cases). This are the same limitations like with the NFS
filesystem. Just use FSDir.open(dir, lockFactory).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-Original Message-
From: DM Smith [mailto:dm-sm...@woh.rr.com]
Sent: Sunday, December 05, 2010 12:16 AM
To: dev@lucene.apache.org
Subject: Exception in migrating from 2.9.x to 3.0.2 on Android

The current code that works on Android with 2.9.1, but fails with
3.0.2:

Directory dir = FSDirectory.open(file);
...
do something with directory
...

The error we're seeing is:
12-04 21:34:41.629: WARN/System.err(23160):
java.lang.NoClassDefFoundError:
java.lang.management.ManagementFactory
12-04 21:34:41.639: WARN/System.err(23160): at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLock

Factory.java:87)
12-04 21:34:41.639: WARN/System.err(23160): at
org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactor
y.java:142)
12-04 21:34:41.649: WARN/System.err(23160): at
org.apache.lucene.store.Directory.makeLock(Directory.java:106)
12-04 21:34:41.649: WARN/System.err(23160): at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1058)

Turns out Android does not have
java.lang.management.ManagementFactory.

There are several work arounds in client code, but not sure what is
best.

The bigger question is whether and how Lucene should be modified to
accommodate?

Ultimately FSDirectory.open does the following:
if (Constants.WINDOWS) {
return new SimpleFSDirectory(path, lockFactory);
} else {
return new NIOFSDirectory(path, lockFactory);
}

Should Android be a supported client OS?

If so, wouldn't it be better not to have OS specific if-then-else
and use
reflection or something else?

Thanks,
DM
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
additional
commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android


On 12/5/10 6:15 PM, Robert Muir wrote:

On Sun, Dec 5, 2010 at 6:10 PM, Mark Millermarkrmil...@gmail.com  wrote:

On 12/5/10 5:05 PM, Robert Muir wrote:


what I am saying, is that this is a java project, and I don't want to
write to some least common denominator/intersection of java and android.


So don't - DM submitting cases that don't work and you not giving a shit are
not mutually exclusive.



Just trying to say, i dont think we should change the programming
language of the project without a proper vote.



Then your just overreacting again.

Allow me to sum up for you:

DM: hey, we are trying to use lucene on android - this is not working
Uwe and someone: thats not real java, we don't support it
Rmuir : **%$$!! (kidding - i dont remember what you said)
DM: Oh, pardon me. Well okay - but would anyone be interested in us 
reporting what doesn't work as we go through this? Android is the only 
mobile platform lucene works on I think.

Mark: Oh yeah - interesting - please do. I'd be interested in seeing.
Rmuir: don't change the lucene impl language without a vote! Gr!
Mark: ??
Native Police: why are you so aggressive?


- Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967046#action_12967046
]

Grant Ingersoll commented on SOLR-1979:
---

bq. @Grant: I actually planned to do the regEx based field name mapping in a
separate UpdateProcessor, to make things more flexible

I don't really see that it makes it any more flexible. If it was a general
purpose mapper, maybe, but since it is tied to the language field, why not just
put in the language processor? I've already got the method that choose the
output field as a protected. With that, one merely would need to extend it to
provide an alternate method from what you have proposed.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-1979:
--

Attachment: SOLR-1979.patch

Here's a patch that passes the tests. Note, I modified the Solr base test case
to have some new methods to properly call update handlers and then validate the
results.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

[
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967048#action_12967048
]

Grant Ingersoll commented on SOLR-1979:
---

Note, the patch still needs more tests and needs to check headers, etc. as well
as the better field mapping and the proper language support that Robert is
talking about.

Create LanguageIdentifierUpdateProcessor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap

2010-12-05 Thread Nick Pellow (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967057#action_12967057
 ] 

Nick Pellow commented on LUCENE-2235:
-

I just upgraded to 3.0.3 and we started getting NullPointerExceptions coming 
from PerFieldAnalyzerWrapper.
We have a  PerFieldAnalyzerWrapper that has a null defaultAnalyzer:
{code}
private final PerFieldAnalyzerWrapper analyzer = new 
PerFieldAnalyzerWrapper(null);
{code}

We add analyzers to all fields that are analyzed. ie: field.isAnalyzed() == 
true.
getOffsetGap on  PerFieldAnalyzerWrapper is being called, even for these 
non-analyzed fields. Is this expected behaviour?

Lines 200-203 of DocInverterPerField are: 
{code}
if (anyToken)
  fieldState.offset += docState.analyzer.getOffsetGap(field);
fieldState.boost *= field.getBoost();
  }

{code}
Should this be checking that a field is indeed analyzed before calling 
getOffsetGap ?


 implement PerFieldAnalyzerWrapper.getOffsetGap
 --

 Key: LUCENE-2235
 URL: https://issues.apache.org/jira/browse/LUCENE-2235
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0
 Environment: Any
Reporter: Javier Godoy
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9.4, 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch


 PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), 
 instead it returns the default values from the implementation of Analyzer. 
 (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement 
 getPositionIncrementGap)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2599) Deprecate Spatial Contrib

2010-12-05 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967064#action_12967064
 ] 

Chris Male commented on LUCENE-2599:


I just noticed that Solr depends upon some methods in DistanceUtils.  We'll 
need to move that into the module before removing the contrib from 4x.

 Deprecate Spatial Contrib
 -

 Key: LUCENE-2599
 URL: https://issues.apache.org/jira/browse/LUCENE-2599
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Affects Versions: 4.0
Reporter: Chris Male
 Attachments: LUCENE-2599.patch, LUCENE-2599.patch


 The spatial contrib is blighted by bugs.  The latest series, found by Grant 
 and discussed 
 [here|http://search.lucidimagination.com/search/document/c32e81783642df47/spatial_rethinking_cartesian_tiers_implementation]
  shows that we need to re-think the cartesian tier implementation.
 Given the need to create a spatial module containing code taken from both 
 lucene and Solr, it makes sense to deprecate the spatial contrib, and start 
 from scratch in the new module.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception in migrating from 2.9.x to 3.0.2 on Android

On Sun, Dec 5, 2010 at 9:12 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 I personally consider android a valid platform for lucene and we
 should try to reduce the pain for android folks as much as possible.
 Changing supported platforms is a totally different thing to me.


good, you can start a separate subproject as a port then.

but until then, android isnt supported by lucene-java.
android is a different programming language, and by supporting it, we
change the programming language of the lucene-java project.

this requires a vote, until then, its not supported by definition
since our documented programming language is java, not android.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor