[jira] [Updated] (SOLR-5544) Log spamming by DefaultSolrHighlighter
[ https://issues.apache.org/jira/browse/SOLR-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MANISH KUMAR updated SOLR-5544: --- Summary: Log spamming by DefaultSolrHighlighter (was: Log spamming DefaultSolrHighlighter) > Log spamming by DefaultSolrHighlighter > -- > > Key: SOLR-5544 > URL: https://issues.apache.org/jira/browse/SOLR-5544 > Project: Solr > Issue Type: Improvement > Components: highlighter >Affects Versions: 4.0 >Reporter: MANISH KUMAR >Priority: Minor > > In DefaultSolrHighlighter.java > The method useFastVectorHighlighter has > log.warn( "Solr will use Highlighter instead of FastVectorHighlighter > because {} field does not store TermPositions and TermOffsets.", fieldName ); > Above method gets called each field and there could be cases where > TermPositions & TermOffsets are not stored. > The above line causes huge spamming of logs. > It should be at max a DEBUG level log which will give flexibility of turning > it off in production environments. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5544) Log spamming DefaultSolrHighlighter
[ https://issues.apache.org/jira/browse/SOLR-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MANISH KUMAR updated SOLR-5544: --- Priority: Minor (was: Major) > Log spamming DefaultSolrHighlighter > --- > > Key: SOLR-5544 > URL: https://issues.apache.org/jira/browse/SOLR-5544 > Project: Solr > Issue Type: Improvement > Components: highlighter >Affects Versions: 4.0 >Reporter: MANISH KUMAR >Priority: Minor > > In DefaultSolrHighlighter.java > The method useFastVectorHighlighter has > log.warn( "Solr will use Highlighter instead of FastVectorHighlighter > because {} field does not store TermPositions and TermOffsets.", fieldName ); > Above method gets called each field and there could be cases where > TermPositions & TermOffsets are not stored. > The above line causes huge spamming of logs. > It should be at max a DEBUG level log which will give flexibility of turning > it off in production environments. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5544) Log spamming DefaultSolrHighlighter
MANISH KUMAR created SOLR-5544: -- Summary: Log spamming DefaultSolrHighlighter Key: SOLR-5544 URL: https://issues.apache.org/jira/browse/SOLR-5544 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 4.0 Reporter: MANISH KUMAR In DefaultSolrHighlighter.java The method useFastVectorHighlighter has log.warn( "Solr will use Highlighter instead of FastVectorHighlighter because {} field does not store TermPositions and TermOffsets.", fieldName ); Above method gets called each field and there could be cases where TermPositions & TermOffsets are not stored. The above line causes huge spamming of logs. It should be at max a DEBUG level log which will give flexibility of turning it off in production environments. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
Bill Bell created SOLR-5543: --- Summary: solr.xml duplicat eentries after SWAP 4.6 Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent="true" - it creates duplicate lines in solr.xml. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843935#comment-13843935 ] Noble Paul commented on SOLR-5473: -- bq. if(debugState && Thanks for the suggestion.However ,I added it for my dev testing will be removed before commit. > Make one state.json per collection > -- > > Key: SOLR-5473 > URL: https://issues.apache.org/jira/browse/SOLR-5473 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch > > > As defined in the parent issue, store the states of each collection under > /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843933#comment-13843933 ] Noble Paul commented on SOLR-5473: -- [~timp74] The allCollections will store ALL collections. If you are looking at the trunk . There are no external collections in trunk yet. Please apply the patch and check > Make one state.json per collection > -- > > Key: SOLR-5473 > URL: https://issues.apache.org/jira/browse/SOLR-5473 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch > > > As defined in the parent issue, store the states of each collection under > /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4983) Problematic core naming by collection create API
[ https://issues.apache.org/jira/browse/SOLR-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843840#comment-13843840 ] Noble Paul commented on SOLR-4983: -- I think solving HIS problem alone is simple. If the collection is present in the same jvm it is very easy to do a lookup of the collection and of there is a core that serves the collection set the fromIndex as that. If the user can ensure that all his collections are present in all nodes it will be ok. The hard part is making it work with a remote node > Problematic core naming by collection create API > - > > Key: SOLR-4983 > URL: https://issues.apache.org/jira/browse/SOLR-4983 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Chris Toomey > > The SolrCloud collection create API creates cores named > "foo_shard_replica" when asked to create collection "foo". > This is problematic for at least 2 reasons: > 1) these ugly core names show up in the core admin UI, and will vary > depending on which node is being used, > 2) it prevents collections from being used in SolrCloud joins, since join > takes a core name as the fromIndex parameter and there's no single core name > for the collection. As I've documented in > https://issues.apache.org/jira/browse/SOLR-4905 and > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4074038.html, > SolrCloud join does work when the inner collection (fromIndex) is not > sharded, assuming that collection is available and initialized at SolrCloud > bootstrap time. > Could this be changed to instead use the collection name for the core name? > Or at least add a core-name option to the API? -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843827#comment-13843827 ] Mark Miller commented on SOLR-1301: --- bq. I'm not aware of anything needing jersey except perhaps hadoop pulls that in. Yeah, tests use this for running hadoop. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843774#comment-13843774 ] Steve Rowe commented on SOLR-5463: -- Another idea about the cursor: the Base64-encoded text is used verbatim, including the trailing padding '=' characters - these could be stripped out for external use (since they're there just to make the string length divisible by four), and then added back before Base64-decoding. In a URL non-metacharacter '='-s look weird, since they're already used to separate param names and values. > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > -- > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch > > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843748#comment-13843748 ] Steve Rowe edited comment on SOLR-5463 at 12/10/13 12:21 AM: - {quote} * searchAfter => cursor * nextSearchAfter => cursorContinue {quote} +1 bq. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page I tried making this mistake (using the trailing unique id ("NOK" in this example) as the searchAfter param value, and I got the following error message: {code} { "responseHeader":{ "status":400, "QTime":2}, "error":{ "msg":"Unable to parse search after totem: NOK", "code":400}} {code} (*edit*: {{cursorContinue}} => {{cursor}} in the sentence below) I think that error message should include the param name ({{cursor}}) that couldn't be parsed. Also, maybe it would be useful to include a prefix that will (probably) never be used in unique ids, to visually identify the cursor as such: like always prepending '*'? So your example of the future would become: {code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&cursor=*AoEjTk9L} { "responseHeader":{ "status":0, "QTime":7}, "response":{"numFound":32,"start":-1,"docs":[ // ... docs here... ] }, "cursorContinue":"*AoEoMDU3OUIwMDI="} {code} The error message when someone gives an unparseable {{cursor}} could then include this piece of information: "cursors begin with an asterisk". was (Author: steve_rowe): {quote} * searchAfter => cursor * nextSearchAfter => cursorContinue {quote} +1 bq. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page I tried making this mistake (using the trailing unique id ("NOK" in this example) as the searchAfter param value, and I got the following error message: {code} { "responseHeader":{ "status":400, "QTime":2}, "error":{ "msg":"Unable to parse search after totem: NOK", "code":400}} {code} I think that error message should include the param name ({{cursorContinue}}) that couldn't be parsed. Also, maybe it would be useful to include a prefix that will (probably) never be used in unique ids, to visually identify the cursor as such: like always prepending '*'? So your example of the future would become: {code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&cursor=*AoEjTk9L} { "responseHeader":{ "status":0, "QTime":7}, "response":{"numFound":32,"start":-1,"docs":[ // ... docs here... ] }, "cursorContinue":"*AoEoMDU3OUIwMDI="} {code} The error message when someone gives an unparseable {{cursor}} could then include this piece of information: "cursors begin with an asterisk". > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > -- > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch > > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843749#comment-13843749 ] Timothy Potter commented on SOLR-5473: -- Thanks for fixing the CloudSolrServerTest failure ... One thing I wasn't sure about when looking over the latest patch was whether allCollections in ZkStateReader will hold the names of external collections? I assume so by the name *all* but it doesn't seem like any external collection names are added to that Set currently. > Make one state.json per collection > -- > > Key: SOLR-5473 > URL: https://issues.apache.org/jira/browse/SOLR-5473 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch > > > As defined in the parent issue, store the states of each collection under > /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843748#comment-13843748 ] Steve Rowe commented on SOLR-5463: -- {quote} * searchAfter => cursor * nextSearchAfter => cursorContinue {quote} +1 bq. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page I tried making this mistake (using the trailing unique id ("NOK" in this example) as the searchAfter param value, and I got the following error message: {code} { "responseHeader":{ "status":400, "QTime":2}, "error":{ "msg":"Unable to parse search after totem: NOK", "code":400}} {code} I think that error message should include the param name ({{cursorContinue}}) that couldn't be parsed. Also, maybe it would be useful to include a prefix that will (probably) never be used in unique ids, to visually identify the cursor as such: like always prepending '*'? So your example of the future would become: {code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&cursor=*AoEjTk9L} { "responseHeader":{ "status":0, "QTime":7}, "response":{"numFound":32,"start":-1,"docs":[ // ... docs here... ] }, "cursorContinue":"*AoEoMDU3OUIwMDI="} {code} The error message when someone gives an unparseable {{cursor}} could then include this piece of information: "cursors begin with an asterisk". > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > -- > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch > > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5364) Review usages of hard-coded Version constants
[ https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-5364. Resolution: Fixed Fix Version/s: 4.7 5.0 Assignee: Steve Rowe Lucene Fields: New,Patch Available (was: New) Committed to trunk and branch_4x. I added a note to the Lucene ReleaseToDo wiki page about using {{:Post-Release-Update-Version.LUCENE_XY:}} to find constants that should be upgraded to the next release version after a release branch has been cut. > Review usages of hard-coded Version constants > - > > Key: LUCENE-5364 > URL: https://issues.apache.org/jira/browse/LUCENE-5364 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 5.0, 4.7 >Reporter: Steve Rowe >Assignee: Steve Rowe >Priority: Minor > Fix For: 5.0, 4.7 > > Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, > LUCENE-5364-trunk.patch > > > There are some hard-coded {{Version.LUCENE_XY}} constants used in various > places. Some of these are intentional and appropriate: > * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses > {{Version.LUCENE_31}} > * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other > analysis components) > * to test different behavior at different points in history (e.g. > {{TestStopFilter}} to test position increments) > But should hard-coded constants be used elsewhere? > For those that should remain, and need to be updated with each release, there > should be an easy way to find them. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843687#comment-13843687 ] Hoss Man commented on SOLR-5463: The one significant change i still want to make before abandoming this straw man and moving on to using PaginatingCollector under the covers is to rethink the vocabulary. at the Lucene/IndexSearcher level, this functionality is leveraged using a "searchAfter" param which indicates the exact "FieldDoc" returned by a previous search. The name makes a lot of sense in this API given that the FieldDoc you specify is expected to come from a previous search, and you are specifying that you want to "search for documents after this document" in the ocntext of the specified query/sort. For the Solr request API however, I feel like this terminology might confuse people. I'm concerned people might think they can use the uniqueKey of the last document they got on the previous page (instead of realizing they need to specify the special token they were returned as part of that page). My thinking is that from a user perspective, we should call this functionality a "Result Cursor" and rename the request param and response key appropriately. something along the lines of... {code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&cursor=AoEjTk9L} { "responseHeader":{ "status":0, "QTime":7}, "response":{"numFound":32,"start":-1,"docs":[ // ... docs here... ] }, "cursorContinue":"AoEoMDU3OUIwMDI="} {code} * searchAfter => cursor * nextSearchAfter => cursorContinue What do folks think? > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > -- > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch > > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5463: --- Attachment: SOLR-5463__straw_man.patch Ok, updated patch making the change in user semantics I mentioned wanting to try last week. Best way to explain it is with a walk through of a simple example (note: if you try the current strawman code, the "numFound" and "start" values returned in the docList don't match what i've pasted in the examples below -- these examples show what the final results should look like in the finished solution) Initial requests using searchAfter should always start with a totem value of "{{\*}}" {code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=*} { "responseHeader":{ "status":0, "QTime":2}, "response":{"numFound":32,"start":-1,"docs":[ // ...20 docs here... ] }, "nextSearchAfter":"AoEjTk9L"} {code} The {{nextSearchAfter}} token returned by this request tells us what to use in the second request... {code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=AoEjTk9L} { "responseHeader":{ "status":0, "QTime":7}, "response":{"numFound":32,"start":-1,"docs":[ // ...12 docs here... ] }, "nextSearchAfter":"AoEoMDU3OUIwMDI="} {code} Since this result block contains fewer rows then were requested, the client could automatically stop, but the {{nextSearchAfter}} is still returned, and it's still safe to request a subsequent page (this is the fundemental diff from the previous patches, where {{nextSearchAfter}} was set to {{null}} anytime the code could tell there were no more results ... {code:title=http://localhost:8983/solr/deep?q=*:*&wt=json&indent=true&rows=20&fl=id,price&sort=id+desc&searchAfter=AoEoMDU3OUIwMDI=} { "responseHeader":{ "status":0, "QTime":1}, "response":{"numFound":32,"start":-1,"docs":[] }, "nextSearchAfter":"AoEoMDU3OUIwMDI="} {code} Note that in this case, with no docs included in the response, the {{nextSearchAfter}} totem is the same as the input. For some sorts this makes it possible for clients to "resume" a full walk of all documents matching a query -- picking up where they let off if more documents are added to the index that match (for example: when doing an ascending sort on a numeric uniqueKey field that always increases as new docs are added, sorting by a timestamp field (asc) indicating when documents are crawled, etc...) This also works as you would expect for searches that don't match any documents... {code:title=http://localhost:8983/solr/deep?q=text:bogus&rows=20&sort=id+desc&searchAfter=*} { "responseHeader":{ "status":0, "QTime":21}, "response":{"numFound":0,"start":-1,"docs":[] }, "nextSearchAfter":"*"} {code} > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > -- > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, > SOLR-5463__straw_man.patch > > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1.4#6159) --
[jira] [Commented] (LUCENE-5364) Review usages of hard-coded Version constants
[ https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843658#comment-13843658 ] ASF subversion and git services commented on LUCENE-5364: - Commit 1549703 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1549703 ] LUCENE-5364: Replace hard-coded Version.LUCENE_XY that doesn't have to be hard-coded (because of back-compat testing or version dependent behavior, or demo code that should exemplify pinning versions in user code), with Version.LUCENE_CURRENT in non-test code, or with LuceneTestCase.TEST_VERSION_CURRENT in test code; upgrade hard-coded Version.LUCENE_XY constants that should track the next release version to the next release version if they aren't already there, and put a token near them so that they can be found and upgraded when the next release version changes: ':Post-Release-Update-Version.LUCENE_XY:' (merge trunk r1549701) > Review usages of hard-coded Version constants > - > > Key: LUCENE-5364 > URL: https://issues.apache.org/jira/browse/LUCENE-5364 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 5.0, 4.7 >Reporter: Steve Rowe >Priority: Minor > Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, > LUCENE-5364-trunk.patch > > > There are some hard-coded {{Version.LUCENE_XY}} constants used in various > places. Some of these are intentional and appropriate: > * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses > {{Version.LUCENE_31}} > * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other > analysis components) > * to test different behavior at different points in history (e.g. > {{TestStopFilter}} to test position increments) > But should hard-coded constants be used elsewhere? > For those that should remain, and need to be updated with each release, there > should be an easy way to find them. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5364) Review usages of hard-coded Version constants
[ https://issues.apache.org/jira/browse/LUCENE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843652#comment-13843652 ] ASF subversion and git services commented on LUCENE-5364: - Commit 1549701 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1549701 ] LUCENE-5364: Replace hard-coded Version.LUCENE_XY that doesn't have to be hard-coded (because of back-compat testing or version dependent behavior, or demo code that should exemplify pinning versions in user code), with Version.LUCENE_CURRENT in non-test code, or with LuceneTestCase.TEST_VERSION_CURRENT in test code; upgrade hard-coded Version.LUCENE_XY constants that should track the next release version to the next release version if they aren't already there, and put a token near them so that they can be found and upgraded when the next release version changes: ':Post-Release-Update-Version.LUCENE_XY:' > Review usages of hard-coded Version constants > - > > Key: LUCENE-5364 > URL: https://issues.apache.org/jira/browse/LUCENE-5364 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: 5.0, 4.7 >Reporter: Steve Rowe >Priority: Minor > Attachments: LUCENE-5364-branch_4x.patch, LUCENE-5364-trunk.patch, > LUCENE-5364-trunk.patch > > > There are some hard-coded {{Version.LUCENE_XY}} constants used in various > places. Some of these are intentional and appropriate: > * in deprecated code, e.g. {{ArabicLetterTokenizer}}, deprecated in 3.1, uses > {{Version.LUCENE_31}} > * to make behavior version-dependent (e.g. {{StandardTokenizer}} and other > analysis components) > * to test different behavior at different points in history (e.g. > {{TestStopFilter}} to test position increments) > But should hard-coded constants be used elsewhere? > For those that should remain, and need to be updated with each release, there > should be an easy way to find them. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5541: - Attachment: SOLR-5541.patch > Allow QueryElevationComponent to accept elevateIds and excludeIds as http > parameters > > > Key: SOLR-5541 > URL: https://issues.apache.org/jira/browse/SOLR-5541 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 4.6 >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.7 > > Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch > > > The QueryElevationComponent currently uses an xml file to map query strings > to elevateIds and excludeIds. > This ticket adds the ability to pass in elevateIds and excludeIds through two > new http parameters "elevateIds" and "excludeIds". > This will allow more sophisticated business logic to be used in selecting > which ids to elevate/exclude. > Proposed syntax: > http://localhost:8983/solr/elevate?q=*:*&elevatedIds=3,4&excludeIds=6,8 > The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
Hurray for the improbable... :) D. On Mon, Dec 9, 2013 at 10:36 PM, Simon Willnauer wrote: > nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1 > > On Mon, Dec 9, 2013 at 9:24 PM, wrote: >> Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ >> >> 1 tests failed. >> REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields >> >> Error Message: >> -60 >> >> Stack Trace: >> java.lang.ArrayIndexOutOfBoundsException: -60 >> at >> __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) >> at java.util.ArrayList.get(ArrayList.java:324) >> at >> org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) >> at >> org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) >> at >> org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) >> at >> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) >> at >> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) >> at >> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) >> at >> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) >> at >> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) >> at >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) >> at >> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) >> at >> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) >> at >> org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) >> at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> at >> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> at >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >> at >> com.carrot
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843584#comment-13843584 ] Joel Bernstein commented on SOLR-5541: -- Thanks Mark, I'll fix that up. > Allow QueryElevationComponent to accept elevateIds and excludeIds as http > parameters > > > Key: SOLR-5541 > URL: https://issues.apache.org/jira/browse/SOLR-5541 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 4.6 >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.7 > > Attachments: SOLR-5541.patch, SOLR-5541.patch > > > The QueryElevationComponent currently uses an xml file to map query strings > to elevateIds and excludeIds. > This ticket adds the ability to pass in elevateIds and excludeIds through two > new http parameters "elevateIds" and "excludeIds". > This will allow more sophisticated business logic to be used in selecting > which ids to elevate/exclude. > Proposed syntax: > http://localhost:8983/solr/elevate?q=*:*&elevatedIds=3,4&excludeIds=6,8 > The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
I committed a fix On Mon, Dec 9, 2013 at 9:36 PM, Simon Willnauer wrote: > nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1 > > On Mon, Dec 9, 2013 at 9:24 PM, wrote: >> Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ >> >> 1 tests failed. >> REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields >> >> Error Message: >> -60 >> >> Stack Trace: >> java.lang.ArrayIndexOutOfBoundsException: -60 >> at >> __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) >> at java.util.ArrayList.get(ArrayList.java:324) >> at >> org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) >> at >> org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) >> at >> org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) >> at >> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) >> at >> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) >> at >> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) >> at >> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) >> at >> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) >> at >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) >> at >> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) >> at >> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) >> at >> org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) >> at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> at >> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> at >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >> at >> com.carrotsearch.randomizedte
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
nice one we ran into Math.abs(Integer.MIN_VALUE) which returns -1 On Mon, Dec 9, 2013 at 9:24 PM, wrote: > Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ > > 1 tests failed. > REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields > > Error Message: > -60 > > Stack Trace: > java.lang.ArrayIndexOutOfBoundsException: -60 > at > __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) > at java.util.ArrayList.get(ArrayList.java:324) > at > org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) > at > org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) > at > org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) > at > org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) > at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) > at > org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) > at > com.carrotsearch.randomizedtesting.rules.StatementAda
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843573#comment-13843573 ] Mark Miller commented on SOLR-5541: --- +1 One comment: + assertQ("All six should make it", req Should update the copy/paste assert comment - only 5 should make it because b is excluded. > Allow QueryElevationComponent to accept elevateIds and excludeIds as http > parameters > > > Key: SOLR-5541 > URL: https://issues.apache.org/jira/browse/SOLR-5541 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 4.6 >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.7 > > Attachments: SOLR-5541.patch, SOLR-5541.patch > > > The QueryElevationComponent currently uses an xml file to map query strings > to elevateIds and excludeIds. > This ticket adds the ability to pass in elevateIds and excludeIds through two > new http parameters "elevateIds" and "excludeIds". > This will allow more sophisticated business logic to be used in selecting > which ids to elevate/exclude. > Proposed syntax: > http://localhost:8983/solr/elevate?q=*:*&elevatedIds=3,4&excludeIds=6,8 > The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 5920 - Failure!
Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/5920/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexableField.testArbitraryFields Error Message: -60 Stack Trace: java.lang.ArrayIndexOutOfBoundsException: -60 at __randomizedtesting.SeedInfo.seed([516D1CE5843E2B26:7C99D9D3760B2809]:0) at java.util.ArrayList.get(ArrayList.java:324) at org.apache.lucene.search.RandomSimilarityProvider.get(RandomSimilarityProvider.java:106) at org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.computeNorm(PerFieldSimilarityWrapper.java:45) at org.apache.lucene.index.NormsConsumerPerField.finish(NormsConsumerPerField.java:49) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:201) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1190) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:146) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) at org.apache.lucene.index.TestIndexableField.testArbitraryFields(TestIndexableField.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAf
[jira] [Commented] (SOLR-4983) Problematic core naming by collection create API
[ https://issues.apache.org/jira/browse/SOLR-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843571#comment-13843571 ] Mark Miller commented on SOLR-4983: --- bq. could anyone suggest if by creating cores separately (with the same collection name) we would achieve the same effect as creating collection via Collections API? By and large, currently, yes, this is supported. There is a flag that tracks if the collection was created with the collections api or not - and if it is, you will end up being able to use further features in the future - but currently you should be able to use the cores api to do what you want no problem. > Problematic core naming by collection create API > - > > Key: SOLR-4983 > URL: https://issues.apache.org/jira/browse/SOLR-4983 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Chris Toomey > > The SolrCloud collection create API creates cores named > "foo_shard_replica" when asked to create collection "foo". > This is problematic for at least 2 reasons: > 1) these ugly core names show up in the core admin UI, and will vary > depending on which node is being used, > 2) it prevents collections from being used in SolrCloud joins, since join > takes a core name as the fromIndex parameter and there's no single core name > for the collection. As I've documented in > https://issues.apache.org/jira/browse/SOLR-4905 and > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4074038.html, > SolrCloud join does work when the inner collection (fromIndex) is not > sharded, assuming that collection is available and initialized at SolrCloud > bootstrap time. > Could this be changed to instead use the collection name for the core name? > Or at least add a core-name option to the API? -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5541: - Attachment: SOLR-5541.patch Added test case > Allow QueryElevationComponent to accept elevateIds and excludeIds as http > parameters > > > Key: SOLR-5541 > URL: https://issues.apache.org/jira/browse/SOLR-5541 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 4.6 >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.7 > > Attachments: SOLR-5541.patch, SOLR-5541.patch > > > The QueryElevationComponent currently uses an xml file to map query strings > to elevateIds and excludeIds. > This ticket adds the ability to pass in elevateIds and excludeIds through two > new http parameters "elevateIds" and "excludeIds". > This will allow more sophisticated business logic to be used in selecting > which ids to elevate/exclude. > Proposed syntax: > http://localhost:8983/solr/elevate?q=*:*&elevatedIds=3,4&excludeIds=6,8 > The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843524#comment-13843524 ] Steve Rowe edited comment on SOLR-1301 at 12/9/13 8:34 PM: --- bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? I can answer this one myself from [https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]: it's an aggregation-only module that depends on all of the cdk-morphlines-* modules. was (Author: steve_rowe): bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? I can answer this one myself from [https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]: it's an aggregation-only modules that depends on all of the cdk-morphlines-* modules. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843524#comment-13843524 ] Steve Rowe commented on SOLR-1301: -- bq. And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? I can answer this one myself from [https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-all/pom.xml]: it's an aggregation-only modules that depends on all of the cdk-morphlines-* modules. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843523#comment-13843523 ] wolfgang hoschek commented on SOLR-1301: Apologies for the confusion. We are upstreaming cdk-morphlines-solr-cell into the solr contrib solr-morphlines-cell as well as cdk-morphlines-solr-core into the solr contrib solr-morphlines-core as well as search-mr into the solr contrib solr-map-reduce. Once the upstreaming is done these old modules will go away. Next, "downstream" will be made identical to "upstream" plus perhaps some critical fixes as necessary, and the upstream/downstream terms will apply in the way folks usually think about them, but we are not quite yet there today, but getting there... cdk-morphlines-all is simply a convenience pom that includes all the other morphline poms so there's less to type for users who like a bit more auto magic. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843496#comment-13843496 ] Steve Rowe commented on SOLR-1301: -- [~whoschek], I'm lost: what do you mean by "upstream"/"downstream"? In my experience, "upstream" refers to a parent project, i.e. one from which the project in question is derived, and "downstream" is the child/derived project. I don't know the history here, but you seem to be referring to the solr contribs when you say "upstream"? If that's true, then my understanding of these terms is the opposite of how you're using them. Maybe the question I should be asking is: what is/are the relationship(s) between/among cdk-morphlines-solr-* and solr-morphlines-*? And (I assume) relatedly, how how does cdk-morphlines-all relate to cdk-morphlines-solr-core/-cell? > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5542) Global query parameters to facet queries
Isaac Hebsh created SOLR-5542: - Summary: Global query parameters to facet queries Key: SOLR-5542 URL: https://issues.apache.org/jira/browse/SOLR-5542 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.6 Reporter: Isaac Hebsh (From the Mailing List) It seems that a facet query does not use the global query parameters (for example, field aliasing for edismax parser). We have an intensive use of facet queries (in some cases, we have a lot of facet.query for a single q), and the using of LocalParams for each facet.query is not convenient. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843462#comment-13843462 ] Mark Miller commented on SOLR-5473: --- bq. if(debugState && Best to do that with debug logging level rather than introduce a debug sys prop for this class. > Make one state.json per collection > -- > > Key: SOLR-5473 > URL: https://issues.apache.org/jira/browse/SOLR-5473 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch > > > As defined in the parent issue, store the states of each collection under > /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843443#comment-13843443 ] wolfgang hoschek edited comment on SOLR-1301 at 12/9/13 7:30 PM: - I'm not aware of anything needing jersey except perhaps hadoop pulls that in. The combined dependencies of all morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The dependencies of each individual morphline modules is here: http://cloudera.github.io/cdk/docs/current/dependencies.html The source and POMs are here, as usual: https://github.com/cloudera/cdk/tree/master/cdk-morphlines By the way, a somewhat separate issue is that it seems to me that the ivy dependences for solr-morphlines-core and solr-morphlines-cell and solr-map-reduce are a bit backwards upstream in that currently solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and those deps should rather be pulled in by the solr-map-reduce (which is a essentially an out-of-the-box app that bundles user level deps). Correspondingly, would be good to organize ivy and mvn upstream in such a way that * solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all minus cdk-morphlines-solr-cell (now upstream) minus cdk-morphlines-solr-core (now upstream) plus xyz * solr-morphlines-cell should depend on solr-morphlines-core plus xyz * solr-morphlines-core should depend on cdk-morphlines-core plus xyz More concretely, FWIW, to see how the deps look like in production releases downstream review the following POMs: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml and https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml and https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml was (Author: whoschek): I'm not aware of anything needing jersey except perhaps hadoop pulls that in. The combined dependencies of all morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The dependencies of each individual morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The source and POMs are here, as usual: https://github.com/cloudera/cdk/tree/master/cdk-morphlines By the way, a somewhat separate issue is that it seems to me that the ivy dependences for solr-morphlines-core and solr-morphlines-cell and solr-map-reduce are a bit backwards upstream in that solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and those deps should rather be pulled in by the solr-map-reduce (which is a essentially an out-of-the-box app). Would be good to organize ivy and mvn upstream in such a way that * solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all plus xyz * solr-morphlines-cell should depend on solr-morphlines-core plus xyz * solr-morphlines-core should depend on cdk-morphlines-core plus xyz More concretely, FWIW, to see how the deps look like in production releases downstream review the following POMs: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml and https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml and https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data mainta
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843443#comment-13843443 ] wolfgang hoschek commented on SOLR-1301: I'm not aware of anything needing jersey except perhaps hadoop pulls that in. The combined dependencies of all morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The dependencies of each individual morphline modules is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/cdk-morphlines-all/dependencies.html The source and POMs are here, as usual: https://github.com/cloudera/cdk/tree/master/cdk-morphlines By the way, a somewhat separate issue is that it seems to me that the ivy dependences for solr-morphlines-core and solr-morphlines-cell and solr-map-reduce are a bit backwards upstream in that solr-morphlines-core pulls in a ton of dependencies that it doesn't need, and those deps should rather be pulled in by the solr-map-reduce (which is a essentially an out-of-the-box app). Would be good to organize ivy and mvn upstream in such a way that * solr-map-reduce should depend on solr-morphlines-cell plus cdk-morphlines-all plus xyz * solr-morphlines-cell should depend on solr-morphlines-core plus xyz * solr-morphlines-core should depend on cdk-morphlines-core plus xyz More concretely, FWIW, to see how the deps look like in production releases downstream review the following POMs: https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-core/pom.xml and https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-solr-cell/pom.xml and https://github.com/cloudera/search/blob/master_1.1.0/search-mr/pom.xml > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > No
[jira] [Commented] (SOLR-5467) Provide Solr Ref Guide in .epub format
[ https://issues.apache.org/jira/browse/SOLR-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843413#comment-13843413 ] Hoss Man commented on SOLR-5467: Thread where this initially came up: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3c528a1321.4060...@hebis.uni-frankfurt.de%3E > Provide Solr Ref Guide in .epub format > -- > > Key: SOLR-5467 > URL: https://issues.apache.org/jira/browse/SOLR-5467 > Project: Solr > Issue Type: Wish > Components: documentation >Reporter: Cassandra Targett > > From the solr-user list, a request for an .epub version of the Solr Ref Guide. > There are two possible approaches that immediately come to mind: > * Ask infra to install a plugin that automatically outputs the Confluence > pages in .epub > * Investigate converting HTML export to .epub with something like calibre > There might be other options, and there would be additional issues for > automating the process of creation and publication, so for now just recording > the request with an issue. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843343#comment-13843343 ] Mark Miller commented on SOLR-1301: --- bq. if we need some of the classes this jar provides, we should declare direct dependencies on the appropriate artifacts. Right - Wolfgang likely knows best when it comes to Morphlines.. At a minimum we should pull the necessary jars in explicitly I think. I've got to take a look at what they are. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > - > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > -- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-N directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5473: - Attachment: SOLR-5473.patch a couple of tests fail > Make one state.json per collection > -- > > Key: SOLR-5473 > URL: https://issues.apache.org/jira/browse/SOLR-5473 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch > > > As defined in the parent issue, store the states of each collection under > /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5525. -- Resolution: Fixed > deprecate ClusterState#getCollectionStates() > - > > Key: SOLR-5525 > URL: https://issues.apache.org/jira/browse/SOLR-5525 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5525.patch, SOLR-5525.patch > > > This is a very expensive call if there are are large no:of collections. > Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843220#comment-13843220 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549592 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1549592 ] SOLR-5525 > deprecate ClusterState#getCollectionStates() > - > > Key: SOLR-5525 > URL: https://issues.apache.org/jira/browse/SOLR-5525 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5525.patch, SOLR-5525.patch > > > This is a very expensive call if there are are large no:of collections. > Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843217#comment-13843217 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549591 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1549591 ] SOLR-5525 > deprecate ClusterState#getCollectionStates() > - > > Key: SOLR-5525 > URL: https://issues.apache.org/jira/browse/SOLR-5525 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5525.patch, SOLR-5525.patch > > > This is a very expensive call if there are are large no:of collections. > Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4386) Variable expansion doesn't work in DIH SimplePropertiesWriter's filename
[ https://issues.apache.org/jira/browse/SOLR-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843099#comment-13843099 ] Ryuzo Yamamoto commented on SOLR-4386: -- Hi! Do you have plan to fix this? I also want to use variable expansion in SimplePropertiesWriter's filename. > Variable expansion doesn't work in DIH SimplePropertiesWriter's filename > > > Key: SOLR-4386 > URL: https://issues.apache.org/jira/browse/SOLR-4386 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 4.1 >Reporter: Jonas Birgander >Assignee: Shalin Shekhar Mangar > Labels: dataimport > Attachments: SOLR-4386.patch > > > I'm testing Solr 4.1, but I've run into some problems with > DataImportHandler's new propertyWriter tag. > I'm trying to use variable expansion in the `filename` field when using > SimplePropertiesWriter. > Here are the relevant parts of my configuration: > conf/solrconfig.xml > - > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > db-data-config.xml > > > > ${country_code} > > > > conf/db-data-config.xml > - > >dateFormat="-MM-dd HH:mm:ss" > type="SimplePropertiesWriter" > directory="conf" > filename="${dataimporter.request.country_code}.dataimport.properties" > /> >driver="${dataimporter.request.db_driver}" > url="${dataimporter.request.db_url}" > user="${dataimporter.request.db_user}" > password="${dataimporter.request.db_password}" > batchSize="${dataimporter.request.db_batch_size}" /> > >query="my normal SQL, not really relevant > -- country=${dataimporter.request.country_code}"> > > > > > > > > If country_code is set to "gb", I want the last_index_time to be read and > written in the file conf/gb.dataimport.properties, instead of the default > conf/dataimport.properties > The variable expansion works perfectly in the SQL and setup of the data > source, but not in the property writer's filename field. > When initiating an import, the log file shows: > Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter > maybeReloadConfiguration > INFO: Loading DIH Configuration: db-data-config.xml > Jan 30, 2013 11:25:42 AM > org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema > INFO: The field :$skipDoc present in DataConfig does not have a counterpart > in Solr Schema > Jan 30, 2013 11:25:42 AM > org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema > INFO: The field :$deleteDocById present in DataConfig does not have a > counterpart in Solr Schema > Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter > loadDataConfig > INFO: Data Configuration loaded successfully > Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter > doFullImport > INFO: Starting Full Import > Jan 30, 2013 11:25:42 AM > org.apache.solr.handler.dataimport.SimplePropertiesWriter > readIndexerProperties > WARNING: Unable to read: > ${dataimporter.request.country_code}.dataimport.properties -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843074#comment-13843074 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549554 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1549554 ] SOLR-5525 > deprecate ClusterState#getCollectionStates() > - > > Key: SOLR-5525 > URL: https://issues.apache.org/jira/browse/SOLR-5525 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5525.patch, SOLR-5525.patch > > > This is a very expensive call if there are are large no:of collections. > Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5525) deprecate ClusterState#getCollectionStates()
[ https://issues.apache.org/jira/browse/SOLR-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843070#comment-13843070 ] ASF subversion and git services commented on SOLR-5525: --- Commit 1549552 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1549552 ] SOLR-5525 > deprecate ClusterState#getCollectionStates() > - > > Key: SOLR-5525 > URL: https://issues.apache.org/jira/browse/SOLR-5525 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5525.patch, SOLR-5525.patch > > > This is a very expensive call if there are are large no:of collections. > Mostly, it is used to check if a collection exists -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3191) field exclusion from fl
[ https://issues.apache.org/jira/browse/SOLR-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-3191: --- Assignee: (was: Shalin Shekhar Mangar) I don't have time right now to review this. I assigned it to myself because there was a lot of public interest but no assignee. However it looks like a couple of other committers have interest in this issue as well. I can only look at this after a few weeks so if no one takes it up, then I will. > field exclusion from fl > --- > > Key: SOLR-3191 > URL: https://issues.apache.org/jira/browse/SOLR-3191 > Project: Solr > Issue Type: Improvement >Reporter: Luca Cavanna >Priority: Minor > Attachments: SOLR-3191.patch, SOLR-3191.patch > > > I think it would be useful to add a way to exclude field from the Solr > response. If I have for example 100 stored fields and I want to return all of > them but one, it would be handy to list just the field I want to exclude > instead of the 99 fields for inclusion through fl. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1632) Distributed IDF
[ https://issues.apache.org/jira/browse/SOLR-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843025#comment-13843025 ] Markus Jelsma commented on SOLR-1632: - It is much faster now, even usable. But i haven't tried it in a larger cluster yet. > Distributed IDF > --- > > Key: SOLR-1632 > URL: https://issues.apache.org/jira/browse/SOLR-1632 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.5 >Reporter: Andrzej Bialecki >Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: 3x_SOLR-1632_doesntwork.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, SOLR-1632.patch, > distrib-2.patch, distrib.patch > > > Distributed IDF is a valuable enhancement for distributed search across > non-uniform shards. This issue tracks the proposed implementation of an API > to support this functionality in Solr. -- This message was sent by Atlassian JIRA (v6.1.4#6159) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org