date:20140115

[jira] [Comment Edited] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872573#comment-13872573
 ] 

Anshum Gupta edited comment on SOLR-5477 at 1/16/14 7:31 AM:
-

I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:

/tasks
./collections/collection1/task1
./cores/core1/collection1/task1/coretask1
./_

This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.

Approach #2:
Continue accepting the request like right now, but just :
# Get the call to return immediately
# Use zk to only track/store the status (persistence). The request status calls 
still comes to the core and the status is fetched from zk by the core instead 
of the client being intelligent and talking directly to zk.

This approach is certainly less intrusive but then also doesn't come with the 
benefit of having the client just watch over a particular zk node for task 
state change etc.


Approach #3 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.


was (Author: anshumg):
I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:

/tasks
./collections/collection1/task1
./cores/core1/collection1/task1/coretask1
./_

This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.


Approach #2 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be

[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 1864 - Failure

2014-01-15 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/1864/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.spelling.suggest.TestFreeTextSuggestions

Error Message:
o=-9223372036854775808

Stack Trace:
java.lang.AssertionError: o=-9223372036854775808
at __randomizedtesting.SeedInfo.seed([CB0AA5A193FA1857]:0)
at 
org.apache.lucene.util.fst.PositiveIntOutputs.valid(PositiveIntOutputs.java:104)
at 
org.apache.lucene.util.fst.PositiveIntOutputs.common(PositiveIntOutputs.java:47)
at 
org.apache.lucene.util.fst.PositiveIntOutputs.common(PositiveIntOutputs.java:32)
at org.apache.lucene.util.fst.Builder.add(Builder.java:422)
at 
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.build(FreeTextSuggester.java:359)
at 
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.build(FreeTextSuggester.java:278)
at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:167)
at 
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:142)
at 
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:163)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1913)
at org.apache.solr.util.TestHarness.query(TestHarness.java:291)
at org.apache.solr.util.TestHarness.query(TestHarness.java:273)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:618)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:611)
at 
org.apache.solr.spelling.suggest.TestFreeTextSuggestions.beforeClass(TestFreeTextSuggestions.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:677)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:724)




Build Log:
[...truncated 11946 lines...]
   [junit4] Suite: org.apache.solr.spelling.suggest.TestFreeTextSuggestions
   [junit4]   2> 2287201 T5170 oas.SolrTestCaseJ4.initCore initCore
   [junit4]   2> Creating dataDir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/solr/build/solr-core/test/J0/./solrtest-TestFreeTextSuggestions-1389855703745
   [junit4]   2> 2287203 T5170 oasc.SolrResourceLoader. new 
SolrResourceLoader for directory: 
'/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/solr/build/solr-core/test-files/solr/collection1/'
   [junit4]   2> 2287204 T5170 oasc.SolrResourceLoader.replaceClassLoader 
Adding 
'file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/solr/build/solr-core

[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.7.0_60-ea-b02) - Build # 3602 - Still Failing!

2014-01-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3602/
Java: 32bit/jdk1.7.0_60-ea-b02 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 50540 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:459: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:398: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\extra-targets.xml:87: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\extra-targets.xml:187: 
Source checkout is dirty after running tests!!! Offending files:
* ./solr/licenses/jackson-core-asl-1.7.4.jar.sha1
* ./solr/licenses/jackson-mapper-asl-1.7.4.jar.sha1
* ./solr/licenses/jersey-core-1.16.jar.sha1

Total time: 105 minutes 35 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.7.0_60-ea-b02 -client -XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene / Solr 4.6.1

2014-01-15 Thread Mark Miller

Whoops - just built this rc with ant 1.9.2 and smoke tester still wants
just 1.8. I'll start another build tonight and send the vote thread in the
morning.

- Mark

On Wed, Jan 15, 2014 at 3:14 PM, Simon Willnauer
wrote:

> +1
>
> On Wed, Jan 15, 2014 at 8:02 PM, Mark Miller 
> wrote:
> > Unless there is an objection, I’m going to try and make a first RC
> tonight.
> >
> > - Mark
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
- Mark

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60-ea-b02) - Build # 9009 - Still Failing!

2014-01-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9009/
Java: 32bit/jdk1.7.0_60-ea-b02 -server -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 50419 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:459: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:398: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:187: Source 
checkout is dirty after running tests!!! Offending files:
* ./solr/licenses/jackson-core-asl-1.7.4.jar.sha1
* ./solr/licenses/jackson-mapper-asl-1.7.4.jar.sha1
* ./solr/licenses/jersey-core-1.16.jar.sha1

Total time: 54 minutes 6 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.7.0_60-ea-b02 -server -XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Flexible Standard Query Parser vs. Classic QP

2014-01-15 Thread Otis Gospodnetic

Hi,

Does anyone know what is the reason the "flexible standard query parser' (I
think that's the name) that Adriano Crestani worked on a few years ago
never replaced the old/classic QP?

Just curious because I noticed Adriano working on
https://issues.apache.org/jira/browse/LUCENE-5344

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

[jira] [Commented] (LUCENE-5344) Flexible StandardQueryParser behaves differently than ClassicQueryParser

2014-01-15 Thread Adriano Crestani (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873025#comment-13873025
 ] 

Adriano Crestani commented on LUCENE-5344:
--

Good catch Michael! I had the change, I just missed it during the commit (not 
sure how).

> Flexible StandardQueryParser behaves differently than ClassicQueryParser
> 
>
> Key: LUCENE-5344
> URL: https://issues.apache.org/jira/browse/LUCENE-5344
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 4.5
>Reporter: Krishna Keldec
>Assignee: Adriano Crestani
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5344_adrianocrestani_2014-01-12.patch, 
> LUCENE-5344_adrianocrestani_2014-01-14.patch, 
> LUCENE-5344_adrianocrestani_2014-01-14_branch_4x.patch
>
>
> AnalyzerQueryNodeProcessor creates a BooleanQueryNode instead of a 
> MultiPhraseQueryNode for some circumstances.
> Classic query parser output: {{+content:a +content:320}}  *(correct)*
> {code:java}
> QueryParser classicQueryParser;
> classicQueryParser = new QueryParser(Version.LUCENE_45, "content", anaylzer);
> classicQueryParser.setDefaultOperator(Operator.AND);
> classicQueryParser.parse("a320"));
> {code}
> Flexible query parser output: {{content:a content:320}} *(wrong)*
> {code:java}
> StandardQueryParser flexibleQueryParser;
> flexibleQueryParser = new StandardQueryParser(anaylzer);
> flexibleQueryParser.setDefaultOperator(Operator.AND);
> flexibleQueryParser.parse("a320", "content"));
> {code}
> The used analyzer:
> {code:java}
> Analyzer anaylzer = return new Analyzer() {
>   protected TokenStreamComponents createComponents(String field, Reader in) {
>   Tokenizer   src = new WhitespaceTokenizer(Version.LUCENE_45, in);
>   TokenStream tok = new WordDelimiterFilter(src,
>  WordDelimiterFilter.SPLIT_ON_NUMERICS |
>  WordDelimiterFilter.GENERATE_WORD_PARTS |
>  WordDelimiterFilter.GENERATE_NUMBER_PARTS,
>  CharArraySet.EMPTY_SET); 
>   return new TokenStreamComponents(src, tok);
> };
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5344) Flexible StandardQueryParser behaves differently than ClassicQueryParser

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873024#comment-13873024
 ] 

ASF subversion and git services commented on LUCENE-5344:
-

Commit 1558693 from [~adriano_crestani] in branch 'dev/trunk'
[ https://svn.apache.org/r1558693 ]

LUCENE-5344: adding the change to CHANGES.txt

> Flexible StandardQueryParser behaves differently than ClassicQueryParser
> 
>
> Key: LUCENE-5344
> URL: https://issues.apache.org/jira/browse/LUCENE-5344
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 4.5
>Reporter: Krishna Keldec
>Assignee: Adriano Crestani
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5344_adrianocrestani_2014-01-12.patch, 
> LUCENE-5344_adrianocrestani_2014-01-14.patch, 
> LUCENE-5344_adrianocrestani_2014-01-14_branch_4x.patch
>
>
> AnalyzerQueryNodeProcessor creates a BooleanQueryNode instead of a 
> MultiPhraseQueryNode for some circumstances.
> Classic query parser output: {{+content:a +content:320}}  *(correct)*
> {code:java}
> QueryParser classicQueryParser;
> classicQueryParser = new QueryParser(Version.LUCENE_45, "content", anaylzer);
> classicQueryParser.setDefaultOperator(Operator.AND);
> classicQueryParser.parse("a320"));
> {code}
> Flexible query parser output: {{content:a content:320}} *(wrong)*
> {code:java}
> StandardQueryParser flexibleQueryParser;
> flexibleQueryParser = new StandardQueryParser(anaylzer);
> flexibleQueryParser.setDefaultOperator(Operator.AND);
> flexibleQueryParser.parse("a320", "content"));
> {code}
> The used analyzer:
> {code:java}
> Analyzer anaylzer = return new Analyzer() {
>   protected TokenStreamComponents createComponents(String field, Reader in) {
>   Tokenizer   src = new WhitespaceTokenizer(Version.LUCENE_45, in);
>   TokenStream tok = new WordDelimiterFilter(src,
>  WordDelimiterFilter.SPLIT_ON_NUMERICS |
>  WordDelimiterFilter.GENERATE_WORD_PARTS |
>  WordDelimiterFilter.GENERATE_NUMBER_PARTS,
>  CharArraySet.EMPTY_SET); 
>   return new TokenStreamComponents(src, tok);
> };
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5403) move WithNestedTests from src/test to src/test-framework

2014-01-15 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-5403:
---

 Summary: move WithNestedTests from src/test to src/test-framework
 Key: LUCENE-5403
 URL: https://issues.apache.org/jira/browse/LUCENE-5403
 Project: Lucene - Core
  Issue Type: Test
  Components: general/test
Reporter: Robert Muir


This class is abstract: its useful if you want test-the-tester tests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5636) SolrRequestParsers does some xpath lookups on every request.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873012#comment-13873012
 ] 

ASF subversion and git services commented on SOLR-5636:
---

Commit 1558690 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558690 ]

SOLR-5636: SolrRequestParsers does some xpath lookups on every request, which 
can cause concurrency issues.

> SolrRequestParsers does some xpath lookups on every request.
> 
>
> Key: SOLR-5636
> URL: https://issues.apache.org/jira/browse/SOLR-5636
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5636.patch
>
>
> This seems a bit wasteful for one, but also, under heavy load, with lots of 
> cores on a node, I've seen this xpath parsing randomly fail with weird 
> nullpointer exceptions. Perhaps depends on the xml parser you end up using. 
> Anyway, it's easy to work around and avoid the parsing everytime 
> solrdispatchfilter is hit by just doing it up front once.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5636) SolrRequestParsers does some xpath lookups on every request.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873010#comment-13873010
 ] 

ASF subversion and git services commented on SOLR-5636:
---

Commit 1558688 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1558688 ]

SOLR-5636: SolrRequestParsers does some xpath lookups on every request, which 
can cause concurrency issues.

> SolrRequestParsers does some xpath lookups on every request.
> 
>
> Key: SOLR-5636
> URL: https://issues.apache.org/jira/browse/SOLR-5636
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5636.patch
>
>
> This seems a bit wasteful for one, but also, under heavy load, with lots of 
> cores on a node, I've seen this xpath parsing randomly fail with weird 
> nullpointer exceptions. Perhaps depends on the xml parser you end up using. 
> Anyway, it's easy to work around and avoid the parsing everytime 
> solrdispatchfilter is hit by just doing it up front once.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5402) Add support for index-time pruning in Document*Dictionary (Suggester)

2014-01-15 Thread Areek Zillur (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5402:
-

Attachment: LUCENE-5402.patch

Initial patch for index-time pruning support for DocumentDictionary and 
DocumentExpressionDictionary with tests.

> Add support for index-time pruning in Document*Dictionary (Suggester)
> -
>
> Key: LUCENE-5402
> URL: https://issues.apache.org/jira/browse/LUCENE-5402
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5402.patch
>
>
> It would be nice to be able to prune out entries that the suggester consumes 
> by some query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5402) Add support for index-time pruning in Document*Dictionary (Suggester)

2014-01-15 Thread Areek Zillur (JIRA)

Areek Zillur created LUCENE-5402:


 Summary: Add support for index-time pruning in Document*Dictionary 
(Suggester)
 Key: LUCENE-5402
 URL: https://issues.apache.org/jira/browse/LUCENE-5402
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Fix For: 5.0, 4.7


It would be nice to be able to prune out entries that the suggester consumes by 
some query.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872957#comment-13872957
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558670 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1558670 ]

SOLR-1301: Throw an error if HdfsDirectoryFactory is not configured for now.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872959#comment-13872959
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558671 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558671 ]

SOLR-1301: Throw an error if HdfsDirectoryFactory is not configured for now.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5401) Field.StringTokenStream#end() does not call super.end()

2014-01-15 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872936#comment-13872936
 ] 

Robert Muir commented on LUCENE-5401:
-

+1, nice catch!

> Field.StringTokenStream#end() does not call super.end()
> ---
>
> Key: LUCENE-5401
> URL: https://issues.apache.org/jira/browse/LUCENE-5401
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 4.6
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 4.6.1
>
> Attachments: lucene-5401.patch
>
>
> Field.StringTokenStream#end() currently does not call super.end(). This 
> prevents resetting the PositionIncrementAttribute to 0 in end(), which can 
> lead to wrong positions in the index under certain conditions.
> I added a test to TestDocument which indexes two Fields with the same name, 
> String values, indexed=true, tokenized=false and 
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS. Without the fix the test fails. 
> The first token gets the correct position 0, but the second token gets 
> position 2 instead of 1. The reason is that in DocInverterPerField line 176 
> (which is just after the call to end()) we increment the position a second 
> time, because end() didn't reset the increment to 0.
> All tests pass with the fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-5631.
---

Resolution: Fixed

Thanks Areek!

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872932#comment-13872932
 ] 

ASF subversion and git services commented on SOLR-5631:
---

Commit 1558659 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558659 ]

SOLR-5631: Add support for FreeTextSuggester

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5401) Field.StringTokenStream#end() does not call super.end()

2014-01-15 Thread Michael Busch (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-5401:
--

Attachment: lucene-5401.patch

> Field.StringTokenStream#end() does not call super.end()
> ---
>
> Key: LUCENE-5401
> URL: https://issues.apache.org/jira/browse/LUCENE-5401
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 4.6
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 4.6.1
>
> Attachments: lucene-5401.patch
>
>
> Field.StringTokenStream#end() currently does not call super.end(). This 
> prevents resetting the PositionIncrementAttribute to 0 in end(), which can 
> lead to wrong positions in the index under certain conditions.
> I added a test to TestDocument which indexes two Fields with the same name, 
> String values, indexed=true, tokenized=false and 
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS. Without the fix the test fails. 
> The first token gets the correct position 0, but the second token gets 
> position 2 instead of 1. The reason is that in DocInverterPerField line 176 
> (which is just after the call to end()) we increment the position a second 
> time, because end() didn't reset the increment to 0.
> All tests pass with the fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5401) Field.StringTokenStream#end() does not call super.end()

2014-01-15 Thread Michael Busch (JIRA)

Michael Busch created LUCENE-5401:
-

 Summary: Field.StringTokenStream#end() does not call super.end()
 Key: LUCENE-5401
 URL: https://issues.apache.org/jira/browse/LUCENE-5401
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.6
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 4.6.1


Field.StringTokenStream#end() currently does not call super.end(). This 
prevents resetting the PositionIncrementAttribute to 0 in end(), which can lead 
to wrong positions in the index under certain conditions.

I added a test to TestDocument which indexes two Fields with the same name, 
String values, indexed=true, tokenized=false and 
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS. Without the fix the test fails. The 
first token gets the correct position 0, but the second token gets position 2 
instead of 1. The reason is that in DocInverterPerField line 176 (which is just 
after the call to end()) we increment the position a second time, because end() 
didn't reset the increment to 0.

All tests pass with the fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872916#comment-13872916
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558647 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1558647 ]

SOLR-1301: Move CHANGES entry to 4.7

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators

2014-01-15 Thread Andrew Buchanan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Buchanan updated SOLR-2649:
--

Attachment: SOLR-2649.diff

Here is the initial patch. Really it just involves removing some code and 
adding a few tests to confirm things work. It also modifies the previously 
mentioned test to conform with the expectations above.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-2649.diff
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872911#comment-13872911
 ] 

ASF subversion and git services commented on SOLR-5631:
---

Commit 1558635 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1558635 ]

SOLR-5631: Add support for FreeTextSuggester

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5399:
---

Attachment: LUCENE-5399.patch

I think this is ready ... here's the patch (using diffSources.py).

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872847#comment-13872847
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558621 from [~mikemccand] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558621 ]

LUCENE-5399: remove nocommit

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5354) Distributed sort is broken with CUSTOM FieldType

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872841#comment-13872841
 ] 

ASF subversion and git services commented on SOLR-5354:
---

Commit 1558618 from [~rcmuir] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558618 ]

LUCENE-5399, SOLR-5354: fix distributed grouping to marshal/unmarshal sort 
values properly

> Distributed sort is broken with CUSTOM FieldType
> 
>
> Key: SOLR-5354
> URL: https://issues.apache.org/jira/browse/SOLR-5354
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.4, 4.5, 4.6, 5.0
>Reporter: Jessica Cheng
>Assignee: Steve Rowe
>  Labels: custom, query, sort
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5354.patch, SOLR-5354.patch, SOLR-5354.patch, 
> SOLR-5354.patch, SOLR-5354__fix_function_edge_case.patch
>
>
> We added a custom field type to allow an indexed binary field type that 
> supports search (exact match), prefix search, and sort as unsigned bytes 
> lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator 
> accomplishes what we want, and even though the name of the comparator 
> mentions UTF8, it doesn't actually assume so and just does byte-level 
> operation, so it's good. However, when we do this across different nodes, we 
> run into an issue where in QueryComponent.doFieldSortValues:
>   // Must do the same conversion when sorting by a
>   // String field in Lucene, which returns the terms
>   // data as BytesRef:
>   if (val instanceof BytesRef) {
> UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
> field.setStringValue(spare.toString());
> val = ft.toObject(field);
>   }
> UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually 
> UTF8. I did a hack where I specified our own field comparator to be 
> ByteBuffer based to get around that instanceof check, but then the field 
> value gets transformed into BYTEARR in JavaBinCodec, and when it's 
> unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a 
> ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, 
> which decides to give me comparatorNatural in the else of the TODO for 
> CUSTOM, which barfs because byte[] are not Comparable...
> From Chris Hostetter:
> I'm not very familiar with the distributed sorting code, but based on your
> comments, and a quick skim of the functions you pointed to, it definitely
> seems like there are two problems here for people trying to implement
> custom sorting in custom FieldTypes...
> 1) QueryComponent.doFieldSortValues - this definitely seems like it should
> be based on the FieldType, not an "instanceof BytesRef" check (oddly: the
> comment event suggestsion that it should be using the FieldType's
> indexedToReadable() method -- but it doesn't do that.  If it did, then
> this part of hte logic should work for you as long as your custom
> FieldType implemented indexedToReadable in a sane way.
> 2) QueryComponent.mergeIds - that TODO definitely looks like a gap that
> needs filled.  I'm guessing the sanest thing to do in the CUSTOM case
> would be to ask the FieldComparatorSource (which should be coming from the
> SortField that the custom FieldType produced) to create a FieldComparator
> (via newComparator - the numHits & sortPos could be anything) and then
> wrap that up in a Comparator facade that delegates to
> FieldComparator.compareValues
> That way a custom FieldType could be in complete control of the sort
> comparisons (even when merging ids).
> ...But as i said: i may be missing something, i'm not super familia with
> that code.  Please try it out and let us know if thta works -- either way
> please open a Jira pointing out the problems trying to implement
> distributed sorting in a custom FieldType.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872842#comment-13872842
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558618 from [~rcmuir] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558618 ]

LUCENE-5399, SOLR-5354: fix distributed grouping to marshal/unmarshal sort 
values properly

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (64bit/jdk1.7.0_51) - Build # 3601 - Failure!

2014-01-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3601/
Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 16904 lines...]
[javac] Compiling 10 source files to 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\build\contrib\solr-morphlines-core\classes\java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\contrib\morphlines-core\src\java\org\apache\solr\morphlines\solr\SolrLocator.java:102:
 error: exception MalformedURLException is never thrown in body of 
corresponding try statement
[javac]   } catch (MalformedURLException e) {
[javac] ^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error

[...truncated 1 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:459: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:439: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:39: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\extra-targets.xml:37: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\build.xml:209: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:441:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\common-build.xml:491:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\solr\contrib\map-reduce\build.xml:46:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:507:
 The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1764:
 Compile failed; see the compiler error output for details.

Total time: 100 minutes 13 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops 
-XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_51) - Build # 9005 - Still Failing!

2014-01-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9005/
Java: 64bit/jdk1.7.0_51 -XX:+UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 16634 lines...]
[javac] Compiling 10 source files to 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build/contrib/solr-morphlines-core/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/contrib/morphlines-core/src/java/org/apache/solr/morphlines/solr/SolrLocator.java:102:
 error: exception MalformedURLException is never thrown in body of 
corresponding try statement
[javac]   } catch (MalformedURLException e) {
[javac] ^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error

[...truncated 1 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:459: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:439: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:39: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:37: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build.xml:209: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/common-build.xml:441: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/common-build.xml:491: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/contrib/map-reduce/build.xml:46:
 The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:507: 
The following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1764: 
Compile failed; see the compiler error output for details.

Total time: 56 minutes 12 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0_51 -XX:+UseCompressedOops -XX:+UseG1GC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

2014-01-15 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872666#comment-13872666
 ] 

Mark Miller commented on SOLR-4260:
---

Well the affects I was seeing related to having a control collection with a 
core named collection1 and another collection called collection1. Over shard, 
and that causes some similar looking effects.

I've addressed that and will see if ramping up my tests can spot anything - so 
far cannot replicate in a test though.

> Inconsistent numDocs between leader and replica
> ---
>
> Key: SOLR-4260
> URL: https://issues.apache.org/jira/browse/SOLR-4260
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
> Environment: 5.0.0.2013.01.04.15.31.51
>Reporter: Markus Jelsma
>Assignee: Mark Miller
>Priority: Critical
> Fix For: 5.0, 4.7
>
> Attachments: 192.168.20.102-replica1.png, 
> 192.168.20.104-replica2.png, clusterstate.png, 
> demo_shard1_replicas_out_of_sync.tgz
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using 
> CloudSolrServer we see inconsistencies between the leader and replica for 
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
> a small deviation in then number of documents. The leader and slave deviate 
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my 
> attention, there were small IDF differences for exactly the same record 
> causing a record to shift positions in the result set. During those tests no 
> records were indexed. Consecutive catch all queries also return different 
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor 
> of two and frequently reindex using a fresh build from trunk. I've not seen 
> this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872665#comment-13872665
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558588 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558588 ]

SOLR-1301: IntelliJ config: morphlines-cell Solr contrib needs lucene-core 
test-scope dependency

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872662#comment-13872662
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558586 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558586 ]

SOLR-1301: make debugging these tests a whole lot easier by sending map reduce 
job logging to std out

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872655#comment-13872655
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558584 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558584 ]

SOLR-1301: maven config: fix map-reduce test compilation problem by adding 
dependency on morphline-core's test jar

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872654#comment-13872654
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558582 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558582 ]

SOLR-1301: Ignore this test on Windows - there is a problem with Windows paths 
and Morphlines.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872652#comment-13872652
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558580 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558580 ]

SOLR-1301: Merge Morphlines modules up to Kite 0.10 and CDK 0.9

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872636#comment-13872636
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558572 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558572 ]

SOLR-1301: Update to Kite 0.10 from CDK 0.9

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872626#comment-13872626
 ] 

Anshum Gupta commented on SOLR-5477:


Also, SOLR-5519 suggests that "Creating ZK nodes should be done at overseer (as 
much as possible).".
[~noble.paul] , any suggestions on that?

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872621#comment-13872621
 ] 

Anshum Gupta commented on SOLR-5477:


Also, SOLR-5519 suggests that "Creating ZK nodes should be done at overseer (as 
much as possible).".
[~noble.paul] , any suggestions on that?

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872593#comment-13872593
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558565 from [~mikemccand] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558565 ]

LUCENE-5399: add fangs, but no new bugs found...

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread Areek Zillur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated SOLR-5631:
---

Attachment: SOLR-5631.patch

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread Areek Zillur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated SOLR-5631:
---

Attachment: (was: SOLR-5631.patch)

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread Areek Zillur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated SOLR-5631:
---

Attachment: SOLR-5631.patch

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5631) Add support for FreeTextSuggester in SolrSuggester Component

2014-01-15 Thread Areek Zillur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated SOLR-5631:
---

Attachment: SOLR-5631.patch

Fixed spelling.

Thanks for the review, Robert!

> Add support for FreeTextSuggester in SolrSuggester Component
> 
>
> Key: SOLR-5631
> URL: https://issues.apache.org/jira/browse/SOLR-5631
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Reporter: Areek Zillur
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5631.patch, SOLR-5631.patch
>
>
> Given that new SuggesterComponent can get suggestions from multiple 
> suggesters at once, it would be nice to add support for FreeTextSuggester in 
> Solr. 
> This suggester can be used as a fallback suggester in conjunction with other 
> suggesters



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872573#comment-13872573
 ] 

Anshum Gupta edited comment on SOLR-5477 at 1/15/14 8:52 PM:
-

I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:

/tasks
./collections/collection1/task1
./cores/core1/collection1/task1/coretask1
./_

This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.


Approach #2 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.


was (Author: anshumg):
I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:
{code}
/tasks
  ../collections/collection1/task1

  ../cores/core1/collection1/task1/coretask1
{/code}
This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.


Approach #2 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872573#comment-13872573
 ] 

Anshum Gupta commented on SOLR-5477:


I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:
{code:title=ZK Path}
/tasks
  ../collections/collection1/task1

  ../cores/core1/collection1/task1/coretask1
{/code}
This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.


Approach #2 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872573#comment-13872573
 ] 

Anshum Gupta edited comment on SOLR-5477 at 1/15/14 8:51 PM:
-

I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:
{code}
/tasks
  ../collections/collection1/task1

  ../cores/core1/collection1/task1/coretask1
{/code}
This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.


Approach #2 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.


was (Author: anshumg):
I have a few questions regrading my approach for making the CoreAdmin calls 
async:

Approach #1:
* CoreAdmin requests get submitted to zk.
* Core watches it's zk node for submitted tasks. Request object is the data in 
the node (when submitted).
* On completion, the core deletes the submitted task and puts a new node with 
the response and other metadata into zk.
* Collection API watches the node when it submits a task, waits for it to 
complete.
* On completion of the Collection API call, delete all related core admin 
request nodes in zk that were generated.

* Cleaning up of request nodes in zk happens through an explicit API call.
* Having something on the following lines in zk would be helpful:
{code:title=ZK Path}
/tasks
  ../collections/collection1/task1

  ../cores/core1/collection1/task1/coretask1
{/code}
This would help us delete the entire group of tasks associated to a 
core/collection/core task/collection task.

Questions:
* This move would mean having a lot more clients talk to and write to zk. Does 
this approach make sense as far as the intended direction of SolrCloud is 
concerned?
* Any suggestions/concerns about scalability of zk as far as having multiple 
updates coming into zk is concerned.


Approach #2 (Not the best option, and more like the option if zk has 
scalability issues with everyone writing/watching):
* Not have CoreAdmin calls as async but instead introduce a tracking mode. Once 
the task is submitted [with async = "taskid"], track this request using an 
in-memory data structure. Even if the request times out, the client can go back 
and query about the task status.

> Async execution of OverseerCollectionProcessor tasks
> 
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch
>
>
> Typical collection admin commands are long running and it is very common to 
> have the requests get timed out.  It is more of a problem if the cluster is 
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id. 
> as separate collection admin command will be added to poll the status of the 
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are 
> completed the queue entry is removed. OverSeerColectionProcessor will perform 
> these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene / Solr 4.6.1

2014-01-15 Thread Simon Willnauer

+1

On Wed, Jan 15, 2014 at 8:02 PM, Mark Miller  wrote:
> Unless there is an objection, I’m going to try and make a first RC tonight.
>
> - Mark
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872512#comment-13872512
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558553 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558553 ]

SOLR-1301: Update jar checksums for Morphlines 0.9.0

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872505#comment-13872505
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558551 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558551 ]

SOLR-1301: Update to Morphlines 0.9.0

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872508#comment-13872508
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558552 from [~rcmuir] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558552 ]

LUCENE-5399: remove code duplication

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872500#comment-13872500
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558548 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558548 ]

SOLR-1301: Fix a couple of bugs around setting up the embedded Solr instance.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872497#comment-13872497
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558547 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558547 ]

SOLR-1301: ignore '.iml' in new Solr contribs' directories; put new Solr 
contribs' lib/ and test-lib/ directories under Subversion control; ignore 
'.jar' in these directories

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872492#comment-13872492
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558545 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558545 ]

SOLR-1301: Clean up.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872489#comment-13872489
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558544 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558544 ]

SOLR-1301: Merge in latest morphlines module updates.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872485#comment-13872485
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558541 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558541 ]

SOLR-1301: Merge in latest solr-map-reduce updates.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872481#comment-13872481
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558540 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558540 ]

SOLR-1301: Straighten out module names so that they match current convention

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872460#comment-13872460
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558533 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558533 ]

SOLR-1301: remove unnecessary (POM-only) dependency 
org.apache.hadoop:hadoop-yarn-server

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872455#comment-13872455
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558529 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558529 ]

SOLR-1301: Ignore these tests on java 8 and j9 for now.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Apache Solr] Filter query Suggester and Spellchecker

2014-01-15 Thread Dyer, James

Alessandro,

The "spellcheck.collate" feature already supports this by specifying 
"spellcheck.maxCollationTries" greater than zero.  This is useful both to 
prevent unauthorized access to data and also to guarantee that suggested 
collations will return some results.

But "maxCollationTries" accomplishes this by running the proposed collation 
queries against the index.  If you are interested in preventing unauthorized 
access only, then you can probably get better performance with a lower-level 
filter on the term level.  There is currently no way to filter the single-term 
suggestions.

I could see this as a nice enhancement, but given the current 
"maxCollationTries" support, it may have a pretty narrow use-case.

I've also thought about moving all the collate functionality to the Lucene 
level, so that clients other than Solr can take advantage of it.  Perhaps 
something along the lines of your proposal could be a work in that direction?

James Dyer
Ingram Content Group
(615) 213-4311

From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
Sent: Wednesday, January 15, 2014 11:53 AM
To: dev@lucene.apache.org
Subject: Re: [Apache Solr] Filter query Suggester and Spellchecker

No one? guys ?

2014/1/14 Alessandro Benedetti 
mailto:benedetti.ale...@gmail.com>>
Hi guys,
this proposal will be for an improvement.
I propose to add the chance of suggest terms ( for Spellchecking and Auto 
Suggest) based only to a subset of Documents.

In this way we can provide security implementations that will allow users to 
see suggestions of terms , only from allowed to see documents.

These are the proposed approaches :

Filter query Auto Suggest

1) retrieve the suggested tokens from the input text using the already cutting 
edge FST based suggester
2) use a similar approach of the TermEnum because
a) we have a small set of suggestions ( reasonable, because we can filter to 
5-10 suggestions max)
So the termEnum approach will be fast.
b) we can get for each suggested token the posting list and make the 
intersection with the resulting DocId list ( from the filter query), if null, 
not return the suggestion.

Filter query Spellcheck

1) we can use the already cutting edge FSA based direct index spellchecker and 
get the suggestions
2) use a similar approach of the TermEnum because
a) we have a small set of suggestions ( reasonable, because we can filter to 
5-10 suggestions max)
So the termEnum approach will be fast.
b) we can get for each suggested token the posting list and make the 
intersection with the resulting DocId list ( from the filter query), if null, 
not return the suggestion.

Of course we will have to add a further parameter in the request handler, 
something like :
spellcheck.qf

Let me know your impression and ideas,

Cheers





--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England



--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872450#comment-13872450
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558525 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558525 ]

SOLR-1301: Ignore windows tests that cannot work because they use UNIX 
semantics. Also remove a never-executed test which tests nothing

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872448#comment-13872448
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558524 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558524 ]

SOLR-1301: Fix windows problem with escaping of folder name (see crazy 
https://github.com/typesafehub/config/blob/master/HOCON.md for correct format: 
string must be quoted and escaped like javascript)

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872443#comment-13872443
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558523 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558523 ]

SOLR-1301: Fix compilation for Java 8 (the Java 8 compiler is more picky, but 
it's not a Java 8 regression: the code was just wrong)

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872441#comment-13872441
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558522 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558522 ]

SOLR-1301: Ivy likes to act funny if you don't declare compile and test 
resources in the same dependency.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872435#comment-13872435
 ] 

ASF subversion and git services commented on SOLR-1301:
---

Commit 1558520 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558520 ]

SOLR-1301: Add a Solr contrib that allows for building Solr indexes via 
Hadoop's MapReduce.

> Add a Solr contrib that allows for building Solr indexes via Hadoop's 
> Map-Reduce.
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: New Feature
>Reporter: Andrzej Bialecki 
>Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, 
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
> log4j-1.2.15.jar
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872431#comment-13872431
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558516 from [~mikemccand] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558516 ]

LUCENE-5399: add fangs, fix 2 bugs

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene / Solr 4.6.1

2014-01-15 Thread Mark Miller

Unless there is an objection, I’m going to try and make a first RC tonight.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b123) - Build # 9004 - Failure!

2014-01-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9004/
Java: 64bit/jdk1.8.0-ea-b123 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
REGRESSION:  org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT

Error Message:
expected:<3> but was:<2>

Stack Trace:
java.lang.AssertionError: expected:<3> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([BA623A1A6355178:BE20422619F4E38C]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133)
at 
org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadL

Re: [Apache Solr] Filter query Suggester and Spellchecker

2014-01-15 Thread Shawn Heisey


On 1/15/2014 10:53 AM, Alessandro Benedetti wrote:

No one? guys ?


I don't really do anything with spellcheck or suggest, so I have no real 
comment.


I can however tell you the way things are generally handled with feature 
requests, suggestions, and proposals: File an issue in JIRA.  If you 
have any code written, upload a patch to the issue. If at all possible, 
make sure we know which code branch and SVN revision number was used to 
make the patch.


If you don't already have one, you'll need to create an account on 
Apache's JIRA server to create an issue.


https://issues.apache.org/jira/browse/SOLR

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Apache Solr] Filter query Suggester and Spellchecker

2014-01-15 Thread Alessandro Benedetti

No one? guys ?


2014/1/14 Alessandro Benedetti 

> Hi guys,
> this proposal will be for an improvement.
> I propose to add the chance of suggest terms ( for Spellchecking and Auto
> Suggest) based only to a subset of Documents.
>
> In this way we can provide security implementations that will allow users
> to see suggestions of terms , only from allowed to see documents.
>
> These are the proposed approaches :
>
> *Filter query Auto Suggest*
>
> 1) retrieve the suggested tokens from the input text using the already
> cutting edge FST based suggester
> 2) use a similar approach of the TermEnum because
> a) we have a small set of suggestions ( reasonable, because we can filter
> to 5-10 suggestions max)
> So the termEnum approach will be fast.
> b) we can get for each suggested token the posting list and make the
> intersection with the resulting DocId list ( from the filter query), if
> null, not return the suggestion.
>
> *Filter query Spellcheck*
>
> 1) we can use the already cutting edge FSA based direct index spellchecker
> and get the suggestions
> 2) use a similar approach of the TermEnum because
> a) we have a small set of suggestions ( reasonable, because we can filter
> to 5-10 suggestions max)
> So the termEnum approach will be fast.
> b) we can get for each suggested token the posting list and make the
> intersection with the resulting DocId list ( from the filter query), if
> null, not return the suggestion.
>
> Of course we will have to add a further parameter in the request handler,
> something like :
> spellcheck.qf
>
> Let me know your impression and ideas,
>
> Cheers
>
>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

[jira] [Resolved] (SOLR-5632) Improve response message for reloading a non-existent core

2014-01-15 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5632.
---

Resolution: Fixed

Thanks Anshum!

> Improve response message for reloading a non-existent core
> --
>
> Key: SOLR-5632
> URL: https://issues.apache.org/jira/browse/SOLR-5632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5632.patch
>
>
> Right now when attempting to reload a non existent core, the CoreAdmin 
> response just contains a stack trace and a message saying "Error handling 
> 'reload' action" with no further information. 
> It'd be good to change it to continue printing stack trace in the log but 
> returning a readable message with more information.
> Ideally, we should be fixing this for other CoreAdmin calls too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5632) Improve response message for reloading a non-existent core

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872282#comment-13872282
 ] 

ASF subversion and git services commented on SOLR-5632:
---

Commit 1558469 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558469 ]

SOLR-5632: Fix SolrCore leak.

> Improve response message for reloading a non-existent core
> --
>
> Key: SOLR-5632
> URL: https://issues.apache.org/jira/browse/SOLR-5632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5632.patch
>
>
> Right now when attempting to reload a non existent core, the CoreAdmin 
> response just contains a stack trace and a message saying "Error handling 
> 'reload' action" with no further information. 
> It'd be good to change it to continue printing stack trace in the log but 
> returning a readable message with more information.
> Ideally, we should be fixing this for other CoreAdmin calls too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5632) Improve response message for reloading a non-existent core

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872280#comment-13872280
 ] 

ASF subversion and git services commented on SOLR-5632:
---

Commit 1558467 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1558467 ]

SOLR-5632: Fix SolrCore leak.

> Improve response message for reloading a non-existent core
> --
>
> Key: SOLR-5632
> URL: https://issues.apache.org/jira/browse/SOLR-5632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5632.patch
>
>
> Right now when attempting to reload a non existent core, the CoreAdmin 
> response just contains a stack trace and a message saying "Error handling 
> 'reload' action" with no further information. 
> It'd be good to change it to continue printing stack trace in the log but 
> returning a readable message with more information.
> Ideally, we should be fixing this for other CoreAdmin calls too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5632) Improve response message for reloading a non-existent core

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872257#comment-13872257
 ] 

ASF subversion and git services commented on SOLR-5632:
---

Commit 1558460 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558460 ]

SOLR-5632: Improve response message for reloading a non-existent core.

> Improve response message for reloading a non-existent core
> --
>
> Key: SOLR-5632
> URL: https://issues.apache.org/jira/browse/SOLR-5632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5632.patch
>
>
> Right now when attempting to reload a non existent core, the CoreAdmin 
> response just contains a stack trace and a message saying "Error handling 
> 'reload' action" with no further information. 
> It'd be good to change it to continue printing stack trace in the log but 
> returning a readable message with more information.
> Ideally, we should be fixing this for other CoreAdmin calls too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5632) Improve response message for reloading a non-existent core

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872255#comment-13872255
 ] 

ASF subversion and git services commented on SOLR-5632:
---

Commit 1558459 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1558459 ]

SOLR-5632: Improve response message for reloading a non-existent core.

> Improve response message for reloading a non-existent core
> --
>
> Key: SOLR-5632
> URL: https://issues.apache.org/jira/browse/SOLR-5632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5632.patch
>
>
> Right now when attempting to reload a non existent core, the CoreAdmin 
> response just contains a stack trace and a message saying "Error handling 
> 'reload' action" with no further information. 
> It'd be good to change it to continue printing stack trace in the log but 
> returning a readable message with more information.
> Ideally, we should be fixing this for other CoreAdmin calls too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872237#comment-13872237
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558451 from [~rcmuir] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558451 ]

LUCENE-5399: current state

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5632) Improve response message for reloading a non-existent core

2014-01-15 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5632:
--

Fix Version/s: 4.7
   5.0
 Assignee: Mark Miller  (was: Anshum Gupta)

> Improve response message for reloading a non-existent core
> --
>
> Key: SOLR-5632
> URL: https://issues.apache.org/jira/browse/SOLR-5632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.6
>Reporter: Anshum Gupta
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5632.patch
>
>
> Right now when attempting to reload a non existent core, the CoreAdmin 
> response just contains a stack trace and a message saying "Error handling 
> 'reload' action" with no further information. 
> It'd be good to change it to continue printing stack trace in the log but 
> returning a readable message with more information.
> Ideally, we should be fixing this for other CoreAdmin calls too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872236#comment-13872236
 ] 

ASF subversion and git services commented on LUCENE-5399:
-

Commit 1558450 from [~rcmuir] in branch 'dev/branches/lucene539399'
[ https://svn.apache.org/r1558450 ]

LUCENE-5399: make branch

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5399:


Attachment: LUCENE-5399.patch

patch with Uwe's idea.

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872227#comment-13872227
 ] 

Uwe Schindler commented on LUCENE-5399:
---

You could do:
{code:java}
  // Tell all comparators their top value:
  for(int i=0;i comp = (FieldComparator) comparators;
comp.setTopValue(after.fields);
  }
{code}

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872228#comment-13872228
 ] 

Robert Muir commented on LUCENE-5399:
-

MIke what java compiler are you using? :)

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (SOLR-3606) Set the default timeout of HttpClient to a nonzero value

2014-01-15 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Cruz updated SOLR-3606:
-

Comment: was deleted

(was: So, "10K threads will be enough for everyone"? If we cannot know the 
upper bound of this timeout value at least it should be configurable so that 
users can insert the correct value.)

> Set the default timeout of HttpClient to a nonzero value
> 
>
> Key: SOLR-3606
> URL: https://issues.apache.org/jira/browse/SOLR-3606
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: jiangwen wei
> Attachments: SOLR-3606.patch
>
>
> The default timeout of HttpClient in HttpShardHandlerFactory and 
> SolrCmdDistributor is set to zero.
> Zero timeout means infinite timeout, which may cause infinite waiting.
> Considering the following case which is observed in our solr cluster:
> There are two servers A and B in solr cluster with two shards.
> Server A receive a search request from client and send a sub request to 
> server B.
> Server B also receive a search request from client and send a sub request to 
> server A.
> the two requests cannot be completed forever, if the threads of jetty server 
> in server A and server B exhausted.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3606) Set the default timeout of HttpClient to a nonzero value

2014-01-15 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872224#comment-13872224
 ] 

André Cruz commented on SOLR-3606:
--

So, "10K threads will be enough for everyone"? If we cannot know the upper 
bound of this timeout value at least it should be configurable so that users 
can insert the correct value.

> Set the default timeout of HttpClient to a nonzero value
> 
>
> Key: SOLR-3606
> URL: https://issues.apache.org/jira/browse/SOLR-3606
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
>Reporter: jiangwen wei
> Attachments: SOLR-3606.patch
>
>
> The default timeout of HttpClient in HttpShardHandlerFactory and 
> SolrCmdDistributor is set to zero.
> Zero timeout means infinite timeout, which may cause infinite waiting.
> Considering the following case which is observed in our solr cluster:
> There are two servers A and B in solr cluster with two shards.
> Server A receive a search request from client and send a sub request to 
> server B.
> Server B also receive a search request from client and send a sub request to 
> server A.
> the two requests cannot be completed forever, if the threads of jetty server 
> in server A and server B exhausted.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5399:


Attachment: LUCENE-5399.patch

same as mike's patch, but compiles with (at least my version) of java7.

would be great if it could be setTopValue(T value)

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5636) SolrRequestParsers does some xpath lookups on every request.

2014-01-15 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872175#comment-13872175
 ] 

Yonik Seeley commented on SOLR-5636:


Boy, that's definitely unexpected...
+1, commit it!

> SolrRequestParsers does some xpath lookups on every request.
> 
>
> Key: SOLR-5636
> URL: https://issues.apache.org/jira/browse/SOLR-5636
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5636.patch
>
>
> This seems a bit wasteful for one, but also, under heavy load, with lots of 
> cores on a node, I've seen this xpath parsing randomly fail with weird 
> nullpointer exceptions. Perhaps depends on the xml parser you end up using. 
> Anyway, it's easy to work around and avoid the parsing everytime 
> solrdispatchfilter is hit by just doing it up front once.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5636) SolrRequestParsers does some xpath lookups on every request.

2014-01-15 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5636:
--

Attachment: SOLR-5636.patch

> SolrRequestParsers does some xpath lookups on every request.
> 
>
> Key: SOLR-5636
> URL: https://issues.apache.org/jira/browse/SOLR-5636
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-5636.patch
>
>
> This seems a bit wasteful for one, but also, under heavy load, with lots of 
> cores on a node, I've seen this xpath parsing randomly fail with weird 
> nullpointer exceptions. Perhaps depends on the xml parser you end up using. 
> Anyway, it's easy to work around and avoid the parsing everytime 
> solrdispatchfilter is hit by just doing it up front once.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5395) Upgrade Spatial4j to 0.4

2014-01-15 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872163#comment-13872163
 ] 

David Smiley commented on LUCENE-5395:
--

I'm moving some point parsing utilities and some specialized distance 
calculations (e.g. DistanceUtils.vectorDistance()) from Spatial4j that are only 
used by Solr, into Solr.

> Upgrade Spatial4j to 0.4
> 
>
> Key: LUCENE-5395
> URL: https://issues.apache.org/jira/browse/LUCENE-5395
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 4.7
>
>
> Spatial4j 0.4 should be released the week of January 13th; a snapshot is 
> published.  A longer version of the delta from 0.4 is in 
> [CHANGES.md|https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md]
> A couple notable new features are:
> * Built-in WKT parser without relying on JTS.  The older shape string format 
> is deprecated.
> * A binary shape codec for reading & writing the shapes to a byte-stream in a 
> reasonably compact manner.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5636) SolrRequestParsers does some xpath lookups on every request.

2014-01-15 Thread Mark Miller (JIRA)

Mark Miller created SOLR-5636:
-

 Summary: SolrRequestParsers does some xpath lookups on every 
request.
 Key: SOLR-5636
 URL: https://issues.apache.org/jira/browse/SOLR-5636
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.7


This seems a bit wasteful for one, but also, under heavy load, with lots of 
cores on a node, I've seen this xpath parsing randomly fail with weird 
nullpointer exceptions. Perhaps depends on the xml parser you end up using. 
Anyway, it's easy to work around and avoid the parsing everytime 
solrdispatchfilter is hit by just doing it up front once.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Document routing problem, upgrading from SolrCloud 4.0.0 to later 4.x

2014-01-15 Thread Per Steffensen


FYI

http://solrlucene.blogspot.dk/2014/01/document-routing-problem-upgrading-from.html 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery

2014-01-15 Thread Dr Oleg Savrasov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872006#comment-13872006
 ] 

Dr Oleg Savrasov commented on LUCENE-5375:
--

Many thanks!

> ToChildBlockJoinQuery becomes crazy on wrong subquery
> -
>
> Key: LUCENE-5375
> URL: https://issues.apache.org/jira/browse/LUCENE-5375
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 4.6
>Reporter: Dr Oleg Savrasov
>  Labels: patch
> Fix For: 5.0, 4.6.1
>
> Attachments: SOLR-5553-1.patch, 
> SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably 
> throws IllegalStateException. 
> (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
>  'The child documents must be orthogonal to the parent documents: the wrapped 
> child query must never return a parent document.'). However 
> ToChildBlockJoinQuery just goes crazy silently. I want to provide simple 
> patch for ToChildBlockJoinQuery with if-throw clause and test.
> See 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5399) PagingFieldCollector is very slow with String fields

2014-01-15 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5399:
---

Attachment: LUCENE-5399.patch

Another iteration, adding fangs to TestSearchAfter, fixing some crabs in 
StringValComparator, and fixing some nocommits ...

> PagingFieldCollector is very slow with String fields
> 
>
> Key: LUCENE-5399
> URL: https://issues.apache.org/jira/browse/LUCENE-5399
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch, 
> LUCENE-5399.patch, LUCENE-5399.patch, LUCENE-5399.patch
>
>
> PagingFieldCollector (sort comparator) is significantly slower with string 
> fields, because of how its "seen on a previous page" works: it calls 
> compareDocToValue(int doc, T t) first to check this. (its the only user of 
> this method)
> This is very slow with String, because no ordinals are used. so each document 
> must lookup ord, then lookup bytes, then compare bytes.
> I think maybe we should replace this method with an 'after' slot, and just 
> have compareDocToAfter or something.
> Otherwise we could use a hack-patch like the one i will upload (i did this 
> just to test the performance, although tests do pass).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871981#comment-13871981
 ] 

ASF subversion and git services commented on LUCENE-5375:
-

Commit 1558351 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1558351 ]

LUCENE-5375: fix javadocs

> ToChildBlockJoinQuery becomes crazy on wrong subquery
> -
>
> Key: LUCENE-5375
> URL: https://issues.apache.org/jira/browse/LUCENE-5375
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 4.6
>Reporter: Dr Oleg Savrasov
>  Labels: patch
> Fix For: 5.0, 4.6.1
>
> Attachments: SOLR-5553-1.patch, 
> SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably 
> throws IllegalStateException. 
> (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
>  'The child documents must be orthogonal to the parent documents: the wrapped 
> child query must never return a parent document.'). However 
> ToChildBlockJoinQuery just goes crazy silently. I want to provide simple 
> patch for ToChildBlockJoinQuery with if-throw clause and test.
> See 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871982#comment-13871982
 ] 

ASF subversion and git services commented on LUCENE-5375:
-

Commit 1558352 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_6'
[ https://svn.apache.org/r1558352 ]

LUCENE-5375: fix javadocs

> ToChildBlockJoinQuery becomes crazy on wrong subquery
> -
>
> Key: LUCENE-5375
> URL: https://issues.apache.org/jira/browse/LUCENE-5375
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 4.6
>Reporter: Dr Oleg Savrasov
>  Labels: patch
> Fix For: 5.0, 4.6.1
>
> Attachments: SOLR-5553-1.patch, 
> SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably 
> throws IllegalStateException. 
> (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
>  'The child documents must be orthogonal to the parent documents: the wrapped 
> child query must never return a parent document.'). However 
> ToChildBlockJoinQuery just goes crazy silently. I want to provide simple 
> patch for ToChildBlockJoinQuery with if-throw clause and test.
> See 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871980#comment-13871980
 ] 

ASF subversion and git services commented on LUCENE-5375:
-

Commit 1558350 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1558350 ]

LUCENE-5375: fix javadocs

> ToChildBlockJoinQuery becomes crazy on wrong subquery
> -
>
> Key: LUCENE-5375
> URL: https://issues.apache.org/jira/browse/LUCENE-5375
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 4.6
>Reporter: Dr Oleg Savrasov
>  Labels: patch
> Fix For: 5.0, 4.6.1
>
> Attachments: SOLR-5553-1.patch, 
> SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably 
> throws IllegalStateException. 
> (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
>  'The child documents must be orthogonal to the parent documents: the wrapped 
> child query must never return a parent document.'). However 
> ToChildBlockJoinQuery just goes crazy silently. I want to provide simple 
> patch for ToChildBlockJoinQuery with if-throw clause and test.
> See 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b123) - Build # 9000 - Failure!

2014-01-15 Thread Michael McCandless

Urgh, I'll fix.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jan 15, 2014 at 6:34 AM, Policeman Jenkins Server
 wrote:
> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9000/
> Java: 64bit/jdk1.8.0-ea-b123 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
>
> All tests passed
>
> Build Log:
> [...truncated 34186 lines...]
> -documentation-lint:
>  [echo] checking for broken html...
> [jtidy] Checking for broken html (such as invalid tags)...
>[delete] Deleting directory 
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/jtidy_tmp
>  [echo] Checking for broken links...
>  [exec]
>  [exec] Crawl/parse...
>  [exec]
>  [exec] Verify...
>  [echo] Checking for missing docs...
>  [exec]
>  [exec] 
> build/docs/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html
>  [exec]   missing Fields: None
>  [exec]
>  [exec] Missing javadocs were found!
>
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:459: The following 
> error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:57: The following 
> error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:208: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:245: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:2331:
>  exec returned: 1
>
> Total time: 58 minutes 2 seconds
> Build step 'Invoke Ant' marked build as failure
> Description set: Java: 64bit/jdk1.8.0-ea-b123 -XX:-UseCompressedOops 
> -XX:+UseConcMarkSweepGC
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure
> Sending email for trigger: Failure
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b123) - Build # 9000 - Failure!

2014-01-15 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9000/
Java: 64bit/jdk1.8.0-ea-b123 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 34186 lines...]
-documentation-lint:
 [echo] checking for broken html...
[jtidy] Checking for broken html (such as invalid tags)...
   [delete] Deleting directory 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/jtidy_tmp
 [echo] Checking for broken links...
 [exec] 
 [exec] Crawl/parse...
 [exec] 
 [exec] Verify...
 [echo] Checking for missing docs...
 [exec] 
 [exec] 
build/docs/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html
 [exec]   missing Fields: None
 [exec] 
 [exec] Missing javadocs were found!

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:459: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:57: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:208: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:245: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:2331: 
exec returned: 1

Total time: 58 minutes 2 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.8.0-ea-b123 -XX:-UseCompressedOops 
-XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: JDK 8 Build 123 & JDK 7 Update 60 build 02 are available on java.net

2014-01-15 Thread Rory O'Donnell Oracle, Dublin Ireland


Hi Uwe,

Another link for you to read/consider on getting a fix into a Critical 
Patch Update.


http://openjdk.java.net/projects/jdk7u/criticalcpufixes.html

Rgds, Rory



On 15/01/2014 10:34, Uwe Schindler wrote:

Hi Dalibor,

thanks for the links! This answers my questions!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: dalibor topic [mailto:dalibor.to...@oracle.com]
Sent: Wednesday, January 15, 2014 11:26 AM
To: Uwe Schindler
Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; 'Dawid Weiss';
'Cecilia Borg'; 'Balchandra Vaidya'
Subject: Re: JDK 8 Build 123 & JDK 7 Update 60 build 02 are available on
java.net

7u51 is a critical patch update. Please see
http://www.oracle.com/technetwork/topics/security/alerts-086861.html for
the schedule of critical patch updates.

See
http://mail.openjdk.java.net/pipermail/jdk7u-dev/2013-
November/008040.html
for information on 7u60.

On 15.01.2014 01:09, Uwe Schindler wrote:

It may be good to get some information a bit earlier, when and which
updates will be released.



--
 Dalibor Topic | Principal Product Manager
Phone: +494089091214  | Mobile: +491737185961


ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603
Geschäftsführer: Jürgen Kunz

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande Handelsregister der
Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher

 Oracle is committed to developing
practices and products that help protect the environment


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: JDK 8 Build 123 & JDK 7 Update 60 build 02 are available on java.net

2014-01-15 Thread Uwe Schindler

Hi Dalibor,

thanks for the links! This answers my questions!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: dalibor topic [mailto:dalibor.to...@oracle.com]
> Sent: Wednesday, January 15, 2014 11:26 AM
> To: Uwe Schindler
> Cc: rory.odonn...@oracle.com; dev@lucene.apache.org; 'Dawid Weiss';
> 'Cecilia Borg'; 'Balchandra Vaidya'
> Subject: Re: JDK 8 Build 123 & JDK 7 Update 60 build 02 are available on
> java.net
> 
> 7u51 is a critical patch update. Please see
> http://www.oracle.com/technetwork/topics/security/alerts-086861.html for
> the schedule of critical patch updates.
> 
> See
> http://mail.openjdk.java.net/pipermail/jdk7u-dev/2013-
> November/008040.html
> for information on 7u60.
> 
> On 15.01.2014 01:09, Uwe Schindler wrote:
> >
> > It may be good to get some information a bit earlier, when and which
> > updates will be released.
> >
> >
> --
>  Dalibor Topic | Principal Product Manager
> Phone: +494089091214  | Mobile: +491737185961
> 
> 
> ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg
> 
> ORACLE Deutschland B.V. & Co. KG
> Hauptverwaltung: Riesstr. 25, D-80992 München
> Registergericht: Amtsgericht München, HRA 95603
> Geschäftsführer: Jürgen Kunz
> 
> Komplementärin: ORACLE Deutschland Verwaltung B.V.
> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande Handelsregister der
> Handelskammer Midden-Niederlande, Nr. 30143697
> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
> 
>  Oracle is committed to developing
> practices and products that help protect the environment


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: JDK 8 Build 123 & JDK 7 Update 60 build 02 are available on java.net

2014-01-15 Thread dalibor topic

7u51 is a critical patch update. Please see 
http://www.oracle.com/technetwork/topics/security/alerts-086861.html for 
the schedule of critical patch updates.


See 
http://mail.openjdk.java.net/pipermail/jdk7u-dev/2013-November/008040.html 
for information on 7u60.


On 15.01.2014 01:09, Uwe Schindler wrote:


It may be good to get some information a bit earlier, when and which 
updates will be released.




--
 Dalibor Topic | Principal Product Manager
Phone: +494089091214  | Mobile: +491737185961 



ORACLE Deutschland B.V. & Co. KG | Kühnehöfe 5 | 22761 Hamburg

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603
Geschäftsführer: Jürgen Kunz

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher

 Oracle is committed to developing 
practices and products that help protect the environment


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery

2014-01-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871901#comment-13871901
 ] 

ASF subversion and git services commented on LUCENE-5375:
-

Commit 1558336 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_6'
[ https://svn.apache.org/r1558336 ]

LUCENE-5375: ToChildBlockJoinQuery works harder to detect mis-use, where the 
parent query incorrectly returns child docs

> ToChildBlockJoinQuery becomes crazy on wrong subquery
> -
>
> Key: LUCENE-5375
> URL: https://issues.apache.org/jira/browse/LUCENE-5375
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 4.6
>Reporter: Dr Oleg Savrasov
>  Labels: patch
> Fix For: 5.0, 4.6.1
>
> Attachments: SOLR-5553-1.patch, 
> SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably 
> throws IllegalStateException. 
> (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
>  'The child documents must be orthogonal to the parent documents: the wrapped 
> child query must never return a parent document.'). However 
> ToChildBlockJoinQuery just goes crazy silently. I want to provide simple 
> patch for ToChildBlockJoinQuery with if-throw clause and test.
> See 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery

2014-01-15 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5375.


   Resolution: Fixed
Fix Version/s: 4.6.1
   5.0

Woop, sorry Oleg: this had fallen past the event horizon on my TODO list!

Thank you, I just committed the last patch.

> ToChildBlockJoinQuery becomes crazy on wrong subquery
> -
>
> Key: LUCENE-5375
> URL: https://issues.apache.org/jira/browse/LUCENE-5375
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 4.6
>Reporter: Dr Oleg Savrasov
>  Labels: patch
> Fix For: 5.0, 4.6.1
>
> Attachments: SOLR-5553-1.patch, 
> SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably 
> throws IllegalStateException. 
> (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
>  'The child documents must be orthogonal to the parent documents: the wrapped 
> child query must never return a parent document.'). However 
> ToChildBlockJoinQuery just goes crazy silently. I want to provide simple 
> patch for ToChildBlockJoinQuery with if-throw clause and test.
> See 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 110 matches

Mail list logo