Fwd: [Orekit Developers] Distribution of orekit-python egg
Hi JCC developers, As this discussion could be of potential interest of others who makes python modules using jcc I cross post it. I am really not experienced in the licensing issues of open source, closed source etc. A JCC generated external java library .egg, do contain some JCC code, is there something I need to do to ensure that the JCC license is fulfilled? In my case the wrapped library is also in apache, so I guess it should not be an issue to put the egg under apache? Regads /Petrus -- Forwarded message -- From: Petrus Hyvönen petrus.hyvo...@gmail.com Date: Fri, Sep 9, 2011 at 10:33 AM Subject: Re: [Orekit Developers] Distribution of orekit-python egg To: orekit-develop...@orekit.org Hi Luc, Good to hear, I think it is good with an easy-entry to the orekit through python. I really don't know much about the licensing stuff. - My work in this process is rather minimal, issuing the right command line parameters. - The JCC tool that I use for wrapping is under apache 2.0 license as well. http://pypi.python.org/pypi/JCC/ - orekit and common maths library are included in the egg. (common math assumed to be under apache as well) Microsoft compiler has been used in the process, but as I understand this does not affect the licensing? Please send me the paper and I can understand it better, I am not really clear right now what the actual content of the authorization is about. Regards /Petrus On Fri, Sep 9, 2011 at 10:04 AM, MAISONOBE Luc luc.maison...@c-s.fr wrote: Hi Petrus, Petrus Hyvönen petrus.hyvo...@gmail.com a écrit : Hi, This is mostly a legal/policy question, Would the license of orekit allow and would it be preferred that I create some python eggs (python installer packages) for orekit? The license allow it and it would be interesting to do. I am using JCC to wrap orekit java library so it is accessible from standard python. All instructions needed are on the orekit wiki. After an update to python 2.7 and rebuilding of the wrapping, I realized that there is some caveats involved on at least windows operating system (such that you need microsoft visual C installed) to create the wrapping, but not to run it. When built I can easily make the eggs that could be put on the orekit or other webpage and easily installed, without the need of that compiler etc. If you want it and if you choose to have all additional wrapping code and packaging distributed under the sames terms as Orekit, i.e. the Apache 2 Software License, then yes, we could host them here in the main Orekit site. You would of course be credited for that. Before we can really distribute your work, we would ask you to sign a Individual Contributor License Agreement which basically say you authorize us to distribute what you did. We follow exactly the same process as the Apache Software Foundation for that. I can send you the template of the paper. Thanks Luc Let me know what you think /Petrus --**--** This message was sent using IMP, the Internet Messaging Program. -- _ Petrus Hyvönen, Uppsala, Sweden Mobile Phone/SMS:+46 73 803 19 00 -- _ Petrus Hyvönen, Uppsala, Sweden Mobile Phone/SMS:+46 73 803 19 00
[jira] [Created] (SOLR-2751) TermsComponent terms.regex and terms.upper does not always work
TermsComponent terms.regex and terms.upper does not always work --- Key: SOLR-2751 URL: https://issues.apache.org/jira/browse/SOLR-2751 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.3 Environment: Solr 3.3 Reporter: Stephan Meisinger TermComponent with a regex does check upper bound only on regexp success. example: terms.regex.flag=case_insensitive terms.fl=suggest_fr terms.limit=10 terms.regex=a.* terms.lower=A terms.upper=b will also check terms starting with 'b' up to 'z'. But this wouldn't be needed. For this example upper is ignored. Currently checks are done: [lower] - start loop at [regexp] - miss: continue [upper] - miss: break [freq] - miss: continue should be done: [lower] - start loop at [upper] - miss: break [freq] - miss: continue (I think double compare is much faster then a std regexp) [regexp] - miss: continue -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101075#comment-13101075 ] Simon Willnauer commented on LUCENE-3416: - I am planning to commit this today, any objections? Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Assignee: Simon Willnauer Attachments: LUCENE-3416.patch, LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the seq. ids. simon On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless luc...@mikemccandless.com wrote: Returning a long seqID seems the least invasive change to make this total ordering possible? Especially since the DWDQ already computes this order... +1 This seems like the most powerful option. -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3424) Return sequence ids from IW update/delete/add/commit to allow total ordering outside of IW
Return sequence ids from IW update/delete/add/commit to allow total ordering outside of IW -- Key: LUCENE-3424 URL: https://issues.apache.org/jira/browse/LUCENE-3424 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Based on the discussion on the [mailing list|http://mail-archives.apache.org/mod_mbox/lucene-dev/201109.mbox/%3CCAAHmpki-h7LUZGCUX_rfFx=q5-YkLJei+piRG=oic8d1pnr...@mail.gmail.com%3E] IW should return sequence ids from update/delete/add and commit to allow ordering of events for consistent transaction logs and recovery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Checkout SearchWorkings.org - it just went live!
Hey folks, Some of you might have heard, myself and a small group of other passionate search technology professionals have been working hard in the last few months to launch a community site known as SearchWorkings.org [1]. This initiative has been set up for other search professionals to have a single point of contact or comprehensive resource where one can learn and talk about all the exciting new developments in the world of open source search. Anyone like yourselves familiar with open source search knows that technologies like Lucene and Solr have grown tremendously in popularity over the years, but with this growth there have also come a number of challenges, such as limited support and education. With the launch of SearchWorkings.org we are convinced we will overcome and resolve some of these challenges. Covering open source search technologies from Apache Lucene and Apache Solr to Apache Mahout, one of the key objectives for the community is to create a place where search specialists can engage with one another and enjoy a single point of contact for various resources, downloads and documentation. Like any other community website, content will be added on a regular basis and community members can also make their own contributions and stay on top of everything search related too. For now, there is access to a extensive resource centre offering online tutorials, downloads, white papers and access to a host of search specialists in the forum. With the ability to post blog items and keep up to date with relevant news, the site is a search specialists dream come true and addresses what we felt was a clear need in the market. Searchworkings.org starts off with an initial focus on Lucene, Solr Friends but aims to be much broader. Each of you can should contribute, tell us their search, data-processing, setup or optimization story. I am looking forward to more and more blogs, articles and tutorials about smaller projects like Apache Lucy, real world case-studies or 3rd party extensions for OSS Search components. have fun, Simon [1] http://www.searchworkings.org [2] Trademark Acknowledgement: Apache Lucene, Apache Solr, Apache Mahout and Apache Lucy respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
On 09/09/2011 11:00, Simon Willnauer wrote: I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the seq. ids. I'm concerned also about the content of the transaction log. In Solr it uses javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit commands). Documents in the log are raw documents, i.e. before analysis. This may have some merits for Solr (e.g. you could imagine having different analysis chains on the Solr slaves), but IMHO it's more of a hassle for Lucene, because it means that the analysis has to be repeated over and over again on all clients. If the analysis chain is costly (e.g. NLP) then it would make sense to have an option to log documents post-analysis, i.e. as correctly typed stored values (e.g. string - numeric) AND the resulting TokenStream-s. This has also the advantage of moving us towards the dumb IndexWriter concept, i.e. separating analysis from the core inverted index functionality. So I'd argue for recording post-analysis docs in the tlog, either exclusively or as a default option. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Updated] (LUCENE-1889) FastVectorHighlighter: support for additional queries
Thanks for the recover Robert! Koji Sekiguchi from mobile On 2011/09/09, at 14:49, Robert Muir (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1889: Attachment: LUCENE-1889_reader.patch here is the patch I applied, might not be the best or whatever, and see the TODO/note in the code. FastVectorHighlighter: support for additional queries - Key: LUCENE-1889 URL: https://issues.apache.org/jira/browse/LUCENE-1889 Project: Lucene - Java Issue Type: Wish Components: modules/highlighter Reporter: Robert Muir Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.5, 4.0 Attachments: LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889_reader.patch I am using fastvectorhighlighter for some strange languages and it is working well! One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc) Here is one thing Michael M posted in the original ticket: {quote} I think a nice [eventual] model would be if we could simply re-run the scorer on the single document (using InstantiatedIndex maybe, or simply some sort of wrapper on the term vectors which are already a mini-inverted-index for a single doc), but extend the scorer API to tell us the exact term occurrences that participated in a match (which I don't think is exposed today). {quote} Due to strange requirements I am using something similar to this (but specialized to our case). I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted, and flattening multiphrasequeries into boolean or'ed phrasequeries. I do not think these things would be 'fast', but i had a few ideas that might help: * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right? * maybe as a last resort, try Query.extractTerms() ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
+1 indeed! All possibilities are are needed. One might do wild things if it is somehow typed. For example, dictionary compression for fields that are tokenized (not only stored), as we already have Term dictionary supporting ord-s. Keeping just a map Token - ord with transaction log... On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki a...@getopt.org wrote: On 09/09/2011 11:00, Simon Willnauer wrote: I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the seq. ids. I'm concerned also about the content of the transaction log. In Solr it uses javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit commands). Documents in the log are raw documents, i.e. before analysis. This may have some merits for Solr (e.g. you could imagine having different analysis chains on the Solr slaves), but IMHO it's more of a hassle for Lucene, because it means that the analysis has to be repeated over and over again on all clients. If the analysis chain is costly (e.g. NLP) then it would make sense to have an option to log documents post-analysis, i.e. as correctly typed stored values (e.g. string - numeric) AND the resulting TokenStream-s. This has also the advantage of moving us towards the dumb IndexWriter concept, i.e. separating analysis from the core inverted index functionality. So I'd argue for recording post-analysis docs in the tlog, either exclusively or as a default option. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java
Hello, Sorry for the newbie question, I just checked out Solr using maven and I get the following compile error on Eclipse: The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java /solr/core/src/java/org/apache/solr/core line 113 Java Problem Am I missing components ? where do you guys go to checks in case of similar problems ? Thank you.
Re: Regarding Transaction logging
On 09/09/2011 12:07, eks dev wrote: +1 indeed! All possibilities are are needed. One might do wild things if it is somehow typed. For example, dictionary compression for fields that are tokenized (not only stored), as we already have Term dictionary supporting ord-s. Keeping just a map Token- ord with transaction log... Hmm, you mean a per-doc map? because a global map would have to be updated as we add new docs, which would make the writing process non-atomic, which is the last thing you want from a transaction log :) As a per-doc compression, sure. In fact, what you describe is essentially a single doc mini-index, because the map is a term dict, the token streams with ords are postings, etc. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101113#comment-13101113 ] Jasper van Veghel commented on SOLR-2066: - Great - ngroups is coming through now! Another thing I noticed is that highlighting doesn't work on account of the resultIds not getting set in the ResponseBuilder. It only happens in combination with _distributed_ grouping - so it works when I do this: http://localhost:8983/solr/foo/select?wt=jsonrows=2group=truegroup.field=dcterms_sourcegroup.ngroups=truehl=true Or this: http://localhost:8983/solr/foo/select?wt=jsonrows=2shards=localhost:8983/solr/foo,localhost:8983/solr/barhl=true But not this: http://localhost:8983/solr/foo/select?wt=jsonrows=2group=truegroup.field=dcterms_sourcegroup.ngroups=trueshards=localhost:8983/solr/foo,localhost:8983/solr/barhl=true Stacktrace: {code}SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:156) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1407) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:680){code} Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
(SOLR-2726) NullPointerException when using spellcheck.q
Well, the patch is attached to the issue and problem is fixed. But how to commit it to svn so that it is fixed with the next build? And how to set Status and Resolution for this issue? Any idea? So I'm really trying to help, but if I fix something and can't get it to svn so that it is fixed in later versions I have to do a fork and maintain my own version with further fixes and patches. Bernd - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
I didn't think, it was just a spontaneous reaction :) At the moment I am using static dictionaries to at least get a grip on size of stored fields (escaping encoded terms) Re: Global Maybe the trick would be to somehow use term dictionary as it must be *eventually* updated? An idea is to write raw token stream for atomicity and reduce it later in compaction phase (e.g on lucene commit())... no matter what we plan do, TL compaction is going to be needed? It is slightly moving target problem (TL chases term dictionary), but I am sure, benefits can be huge. compacted TL entry would need to have a pointer to Term[] used to encode it, but this is by all means doable, just simple Term[]. It surely makes not much sense for high cardinality fields, but if you have something with low cardinality (indexed and stored) on a big (100Mio) collection, this reduces space by exorbitant amounts. I do not know, just trying to build upon the fact that we have term dictionary updated in any case--- This works not only for transaction logging, but also for (Analyzed)-{Stored , indexed} fields. By the way, I never look how our term vectors work, keeping reference to token or verbatim term copy? On Fri, Sep 9, 2011 at 12:31 PM, Andrzej Bialecki a...@getopt.org wrote: On 09/09/2011 12:07, eks dev wrote: +1 indeed! All possibilities are are needed. One might do wild things if it is somehow typed. For example, dictionary compression for fields that are tokenized (not only stored), as we already have Term dictionary supporting ord-s. Keeping just a map Token- ord with transaction log... Hmm, you mean a per-doc map? because a global map would have to be updated as we add new docs, which would make the writing process non-atomic, which is the last thing you want from a transaction log :) As a per-doc compression, sure. In fact, what you describe is essentially a single doc mini-index, because the map is a term dict, the token streams with ords are postings, etc. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1889) FastVectorHighlighter: support for additional queries
[ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-1889: - Attachment: LUCENE-1889-solr.patch Sorry, forgot to include changes to DefaultSolrHighlighter as well (it gets confusing maintaining multiple patches in the same build). I do think the non-reader method should be derprecated as in Robert's comment. FastVectorHighlighter: support for additional queries - Key: LUCENE-1889 URL: https://issues.apache.org/jira/browse/LUCENE-1889 Project: Lucene - Java Issue Type: Wish Components: modules/highlighter Reporter: Robert Muir Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.5, 4.0 Attachments: LUCENE-1889-solr.patch, LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889_reader.patch I am using fastvectorhighlighter for some strange languages and it is working well! One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc) Here is one thing Michael M posted in the original ticket: {quote} I think a nice [eventual] model would be if we could simply re-run the scorer on the single document (using InstantiatedIndex maybe, or simply some sort of wrapper on the term vectors which are already a mini-inverted-index for a single doc), but extend the scorer API to tell us the exact term occurrences that participated in a match (which I don't think is exposed today). {quote} Due to strange requirements I am using something similar to this (but specialized to our case). I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted, and flattening multiphrasequeries into boolean or'ed phrasequeries. I do not think these things would be 'fast', but i had a few ideas that might help: * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right? * maybe as a last resort, try Query.extractTerms() ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
On 09/09/2011 13:20, eks dev wrote: I didn't think, it was just a spontaneous reaction :) At the moment I am using static dictionaries to at least get a grip on size of stored fields (escaping encoded terms) Re: Global Maybe the trick would be to somehow use term dictionary as it must be *eventually* updated? An idea is to write raw token stream for atomicity and reduce it later in compaction phase (e.g on lucene commit())... no matter what we plan do, TL compaction is going to be needed? Compaction - not sure, it would have to preserve the ordering of ops. But some form of primitive compression - certainly, delta coding, vints, etc, anything that can be done per doc, without the need to use data that spans more than 1 record. It is slightly moving target problem (TL chases term dictionary), but I am sure, benefits can be huge. compacted TL entry would need to have a pointer to Term[] used to encode it, but this is by all means doable, just simple Term[]. It surely makes not much sense for high cardinality fields, but if you have something with low cardinality (indexed and stored) on a big (100Mio) collection, this reduces space by exorbitant amounts. I do not know, just trying to build upon the fact that we have term dictionary updated in any case--- If the tlog has a Commit op, then you could theoretically compact all preceding entries ... at least their term dicts. If you compacted the postings, too, then you would essentially have a multi-doc index (naked segment), but it would not be a transaction log anymore, because the update ordering wouldn't be preserved (e.g. intermediate Delete ops would have a different effect). This works not only for transaction logging, but also for (Analyzed)-{Stored , indexed} fields. By the way, I never look how our term vectors work, keeping reference to token or verbatim term copy? It's like term dict + postings, terms are delta front coded like the main term dictionary. It does not reuse terms from the main dict, I think this representation was chosen to avoid ord renumbering when the main term dict is updated - you would have to renumber all term vectors on each commit... -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java
DocumentBuilderFactory is a Java class, not a Solr class, what version of Java are you trying to compile against? Best Erick On Fri, Sep 9, 2011 at 5:53 AM, swiss knife swiss_kn...@email.com wrote: Hello, Sorry for the newbie question, I just checked out Solr using maven and I get the following compile error on Eclipse: The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java /solr/core/src/java/org/apache/solr/core line 113 Java Problem Am I missing components ? where do you guys go to checks in case of similar problems ? Thank you. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java
On 09/09/2011 14:08, Erick Erickson wrote: DocumentBuilderFactory is a Java class, not a Solr class, what version of Java are you trying to compile against? Best Erick On Fri, Sep 9, 2011 at 5:53 AM, swiss knifeswiss_kn...@email.com wrote: Hello, Sorry for the newbie question, I just checked out Solr using maven and I get the following compile error on Eclipse: The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactoryConfig.java /solr/core/src/java/org/apache/solr/coreline 113Java Problem Am I missing components ? where do you guys go to checks in case of similar problems ? I had a similar problem in another project, it turned out to be a conflicting version of the XML API-s in the JRE and a particular version of Xerces. AFAIK this issue has been resolved in the recent versions of Xerces. If you can't upgrade Xerces, try swapping the order of classpath entries, so that one (or the other, can't remember which one...) API takes precedence. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101152#comment-13101152 ] Jan Høydahl commented on SOLR-2366: --- Hoss: Good comments, which need to be decided upon, including corner cases. 1) bq.I would suggest define any case where the spec contains absolute value N after (effective) value M where N M as an error and fail fast. Agree bq.Still not sure what (if anything) should be done about overlapping ranges that appear out of order (ie: 0,100,50..90,150 ... is that 0-100,50-90,90-150 ?) If all gaps are specified as explicit ranges this is no ambiguity, so we could require all gaps to be explicit ranges if one wants to use it? 2) bq.The first three examples suggest that * will be treated as -Infinity and +Infinity based on position (ie: the first and last ranges will be unbounded on one end) but in the last example the wording ...100-200, 200-300 repeating until max seems inconsistent with that. Agree. The 0,10,50,+50,+100,* example would create infinite gaps which would be less than desireable. But 0,10,50,+50,+100,500 would give repeating 100-gaps until upper bound 500, while 0,10,50,+50,+100,500,* would in addition give a last range 500-*. That was the intentional syntax. bq.If we want to support the idea of repeat the last increment continuously that should be with it's own repeat syntax such as the ... (three dots) i suggested in comment 17/Feb/11 23:50 above. I would argue that this should only be legal after an increment and before a concrete value (ie: 0,+10,...,100). Requiring it to follow an increment seems like a given (otherwise what exactly are you repeating?) requiring that it be followed by an absolute value is based on my concern that if it's the last item in the spec (or the last item before *) it results in an infinite number of ranges. Agree. Alternatively, if Solr could compute myField.max(), the useful value of * could be computed a bit smarter, but that would probably be hard to scale in a multi-shard setting. bq.That seems like it isn't specific enough about what is/isn't going to be allowed – particularly since all of the facet.range params can be specified on a per field basis. Didn't really think much about the global params. Silently not caring about gap, begin, end, other would be one way to go, but then the error feedback is not explicit in case of misunderstanding; the user will see that he does not get back what he thought, and start reading the documentation :) I have no good answer to this, other than inventing some syntax. The default could be that facet.range.spec respects the global values for start and end, but also allow explicitly overriding start and end values as part of spec with a special syntax. The following params would result in ranges 0-1, 1-2, 2-3, 3-5, 5-10 : {noformat} facet.range.start=0 facet.range.end=10 facet.range.gap=2 f.bedrooms.facet.range.spec=1,2,3,5 {noformat} But these params would result in the same ranges because we specify start and end with a special syntax N.. for start and ..M for end: {noformat} facet.range.start=100 facet.range.end=200 facet.range.gap=10 f.bedrooms.facet.range.spec=0..,1,2,3,5,..10 {noformat} This would be equivalent with adding the two params f.bedrooms.facet.range.start=0f.bedrooms.facet.range.end=10, which could then still be allowed as an alternative. If the first value of the spec is not an N.., we'll require a facet.range.start. If the last value of the spec is not ..M, we'll require facet.range.end. Also, it must not be allowed to specify both a global facet.range.gap and a global facet.range.spec. Would this be a good compromise? :-) My primary reason for suggesting this is to give users a terse, intuitive syntax for ranges. 4) bq.Should all ranges produced by facet.range.spec be considered gap ranges? even the ones with no lower/upper bound? Good question. I think the values facet.range.include=upper/lower is clear. Outer/edge would need some more work/definition. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.4, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101185#comment-13101185 ] Jan Høydahl commented on SOLR-2366: --- I've given the Wiki page another take, with the new proposed start/end syntax and added an example or two. The mutually exclusive sentence now boils down to facet.range.gap/facet.range.spec being mutually exclusive (one the same field). Have a look at http://wiki.apache.org/solr/VariableRangeGaps#facet.range.spec Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.4, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2366: -- Description: There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) was: There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 Here's Grant's original syntax proposal which is removed from issue description to avoid confusion: {quote} I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 {quote} Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.4, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101204#comment-13101204 ] Jan Høydahl commented on SOLR-2366: --- One thing this improvement needs to tackle is how to return the range buckets in the Response. It will not be enough with the simple range_facet format {code:xml} lst name=facet_ranges lst name=url_length lst name=counts int name=421/int int name=451/int int name=511/int int name=661/int /lst int name=gap3/int int name=start0/int int name=end102/int /lst /lst {code} We need something which can return the explicit ranges, similar to what facet_queries has. This format can then be used for the old plain gap format as well. {code:xml} lst name=facet_ranges lst name=url_length lst name=counts int name=[42 TO 45}1/int int name=[45 TO 48}1/int int name=[51 TO 54}1/int int name=[66 TO 69}1/int /lst int name=gap3/int int name=start0/int int name=end102/int /lst lst name=bedrooms lst name=counts int name=[1 TO *]12/int int name=[2 TO *]31/int int name=[3 TO *]26/int int name=[4 TO *]9/int /lst int name=spec1..*,2..*,3..*,4..*/int int name=includeall/int /lst /lst {code} Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.4, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Matching String + Matching String positions / Lucene 2.3.1
Hi, I'm currently working with Lucene 2.3.1. The aim of my application is to querying(from an HTML form) my lucene index, then displaying results (in a web page)with: - Matching Strings - Matching String positions / offsets For example: -The document: when i was a child, i was a Jedi. -The query: w* I would like to obtain something like: - 3 results - matching strings: when, was, was -offsets: 0-3; 7-9; 22-24 I've tried to use the TermPositionVector class: [...] Hits hits = searcher.search(query); for (int i = 0; i hitCount; i++) { TermPositionVector tfv = (TermPositionVector) searcher.getIndexReader().getTermFreqVector(hits.id(i), content); [...] I will be able to get positions with this vector, but at first, I have to get the matching strings (when, war for the w* query). According to you, what is the easier way ? Thank you Jérôme
Re: setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java
Thank you? Actually javax.xml.parsers.DocumentBuilderFactory.class in included in the JRE1.6 and also in xml-apis-1.0.b2.jar where the versions of DocumentBuilderFactory differ. I removed the dependency on xml-apis-1.0.b2.jar and it went through. What am I supposed to do from here ? Enter a JIRA, see what need to be done to commit the change ? On 09/09/2011 14:08, Erick Erickson wrote: DocumentBuilderFactory is a Java class, not a Solr class, what version of Java are you trying to compile against? Best Erick On Fri, Sep 9, 2011 at 5:53 AM, swiss knifeswiss_kn...@email.com wrote: Hello, Sorry for the newbie question, I just checked out Solr using maven and I get the following compile error on Eclipse: The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java /solr/core/src/java/org/apache/solr/core line 113 Java Problem Am I missing components ? where do you guys go to checks in case of similar problems ? I had a similar problem in another project, it turned out to be a conflicting version of the XML API-s in the JRE and a particular version of Xerces. AFAIK this issue has been resolved in the recent versions of Xerces. If you can't upgrade Xerces, try swapping the order of classpath entries, so that one (or the other, ca n't remember which one...) API takes precedence. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] How to add document to more than one index (but only analyze once)?
Is it possible to add a document to more than one index at the same time, such that document fields are only analyzed one time? For instance, to add document to both a master index, and a smaller near real-time index. I would like to avoid analyzing document fields more than once but I dont see if that is possible at all using Lucene API. Thanks, Bob
[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor
[ https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2750: -- Attachment: SOLR-2750.patch This patch attempts to fix this for trunk Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor Key: SOLR-2750 URL: https://issues.apache.org/jira/browse/SOLR-2750 Project: Solr Issue Type: Bug Reporter: Mark Miller Attachments: SOLR-2750.patch CoreAdminHandler#handleMergeAction DataImportHandler#handleRequestBody -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 414 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/414/ 1 tests failed. REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:217) Build Log (for compile errors): [...truncated 10911 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor
[ https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2750: -- Attachment: SOLR-2750-branch_3x.patch Patch for branch_3x Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor Key: SOLR-2750 URL: https://issues.apache.org/jira/browse/SOLR-2750 Project: Solr Issue Type: Bug Reporter: Mark Miller Attachments: SOLR-2750-branch_3x.patch, SOLR-2750.patch CoreAdminHandler#handleMergeAction DataImportHandler#handleRequestBody -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor
[ https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2750: -- Fix Version/s: 4.0 3.4 Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor Key: SOLR-2750 URL: https://issues.apache.org/jira/browse/SOLR-2750 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 3.4, 4.0 Attachments: SOLR-2750-branch_3x.patch, SOLR-2750.patch CoreAdminHandler#handleMergeAction DataImportHandler#handleRequestBody -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #236: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/236/ 1 tests failed. REGRESSION: org.apache.solr.search.TestRealTimeGet.testStressGetRealtime Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:689) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:146) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) at $Proxy0.invoke(Unknown Source) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) Caused by: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.junit.Assert.fail(Assert.java:91) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:717) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:661) ... 31 more Build Log (for compile errors): [...truncated 23347 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] How to add document to more than one index (but only analyze once)?
How about indexing the new document(s) in memory using RAMDirectory then calling indexWriter.AddIndexesNoOptimize for NRT master index? DIGY On Fri, Sep 9, 2011 at 5:33 PM, Robert Stewart robert_stew...@epam.comwrote: Is it possible to add a document to more than one index at the same time, such that document fields are only analyzed one time? For instance, to add document to both a master index, and a smaller near real-time index. I would like to avoid analyzing document fields more than once but I dont see if that is possible at all using Lucene API. Thanks, Bob
Re: [Lucene.Net] How to add document to more than one index (but only analyze once)?
That sounds like a good plan. How will that affect existing merge scheduling? For master index I use merge factor of 2. On Sep 9, 2011, at 11:44 AM, digy digy wrote: How about indexing the new document(s) in memory using RAMDirectory then calling indexWriter.AddIndexesNoOptimize for NRT master index? DIGY On Fri, Sep 9, 2011 at 5:33 PM, Robert Stewart robert_stew...@epam.comwrote: Is it possible to add a document to more than one index at the same time, such that document fields are only analyzed one time? For instance, to add document to both a master index, and a smaller near real-time index. I would like to avoid analyzing document fields more than once but I dont see if that is possible at all using Lucene API. Thanks, Bob
[VOTE] Release Lucene/Solr 3.4.0, RC1
Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki a...@getopt.org wrote: On 09/09/2011 11:00, Simon Willnauer wrote: I created LUCENE-3424 for this. But I still would like to keep the discussion open here rather than moving this entirely to an issue. There is more about this than only the seq. ids. I'm concerned also about the content of the transaction log. In Solr it uses javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit commands). Documents in the log are raw documents, i.e. before analysis. This may have some merits for Solr (e.g. you could imagine having different analysis chains on the Solr slaves), but IMHO it's more of a hassle for Lucene, because it means that the analysis has to be repeated over and over again on all clients. If the analysis chain is costly (e.g. NLP) then it would make sense to have an option to log documents post-analysis, i.e. as correctly typed stored values (e.g. string - numeric) AND the resulting TokenStream-s. This has also the advantage of moving us towards the dumb IndexWriter concept, i.e. separating analysis from the core inverted index functionality. So I'd argue for recording post-analysis docs in the tlog, either exclusively or as a default option. I am not sure if this should be the default option but I would need to see how this is implemented. if we can efficiently support such a preanalyzed document I am all for it. But I think it should be possible to write opaque documents too. Other implementations / users of lucene should be able to write their app specific format too. simon -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fwd: [Orekit Developers] Distribution of orekit-python egg
On Fri, 9 Sep 2011, Petrus Hyvönen wrote: As this discussion could be of potential interest of others who makes python modules using jcc I cross post it. I am really not experienced in the licensing issues of open source, closed source etc. A JCC generated external java library .egg, do contain some JCC code, is there something I need to do to ensure that the JCC license is fulfilled? In my case the wrapped library is also in apache, so I guess it should not be an issue to put the egg under apache? As the licenses are the same can't imagine an issue here. In the LICENSE file - or is that the NOTICE file - of your distribution, you should include a copy of the Apache2 license for JCC. That being said, I am not a lawyer... Giving attribution to Apache JCC is always appreciated, though. Thanks ! Andi.. Regads /Petrus -- Forwarded message -- From: Petrus Hyvönen petrus.hyvo...@gmail.com Date: Fri, Sep 9, 2011 at 10:33 AM Subject: Re: [Orekit Developers] Distribution of orekit-python egg To: orekit-develop...@orekit.org Hi Luc, Good to hear, I think it is good with an easy-entry to the orekit through python. I really don't know much about the licensing stuff. - My work in this process is rather minimal, issuing the right command line parameters. - The JCC tool that I use for wrapping is under apache 2.0 license as well. http://pypi.python.org/pypi/JCC/ - orekit and common maths library are included in the egg. (common math assumed to be under apache as well) Microsoft compiler has been used in the process, but as I understand this does not affect the licensing? Please send me the paper and I can understand it better, I am not really clear right now what the actual content of the authorization is about. Regards /Petrus On Fri, Sep 9, 2011 at 10:04 AM, MAISONOBE Luc luc.maison...@c-s.fr wrote: Hi Petrus, Petrus Hyvönen petrus.hyvo...@gmail.com a écrit : Hi, This is mostly a legal/policy question, Would the license of orekit allow and would it be preferred that I create some python eggs (python installer packages) for orekit? The license allow it and it would be interesting to do. I am using JCC to wrap orekit java library so it is accessible from standard python. All instructions needed are on the orekit wiki. After an update to python 2.7 and rebuilding of the wrapping, I realized that there is some caveats involved on at least windows operating system (such that you need microsoft visual C installed) to create the wrapping, but not to run it. When built I can easily make the eggs that could be put on the orekit or other webpage and easily installed, without the need of that compiler etc. If you want it and if you choose to have all additional wrapping code and packaging distributed under the sames terms as Orekit, i.e. the Apache 2 Software License, then yes, we could host them here in the main Orekit site. You would of course be credited for that. Before we can really distribute your work, we would ask you to sign a Individual Contributor License Agreement which basically say you authorize us to distribute what you did. We follow exactly the same process as the Apache Software Foundation for that. I can send you the template of the paper. Thanks Luc Let me know what you think /Petrus --**--** This message was sent using IMP, the Internet Messaging Program. -- _ Petrus Hyvönen, Uppsala, Sweden Mobile Phone/SMS:+46 73 803 19 00 -- _ Petrus Hyvönen, Uppsala, Sweden Mobile Phone/SMS:+46 73 803 19 00
[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2959: Attachment: LUCENE-2959.patch attached is a diff (using dev-tools/scripts/diff-sources.py) between the branch and trunk. I think this is ready. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: core/query/scoring, general/javadocs, modules/examples Reporter: David Mark Nemeskey Assignee: Robert Muir Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: flexscoring branch Attachments: LUCENE-2959.patch, LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. The wiki page for the project can be found at http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2959: Attachment: LUCENE-2959.patch oops, i had some stuff in solr/example/work that caused a lot of noise in the patch. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: core/query/scoring, general/javadocs, modules/examples Reporter: David Mark Nemeskey Assignee: Robert Muir Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: flexscoring branch, 4.0 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. The wiki page for the project can be found at http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2959: Fix Version/s: 4.0 [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: core/query/scoring, general/javadocs, modules/examples Reporter: David Mark Nemeskey Assignee: Robert Muir Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: flexscoring branch, 4.0 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. The wiki page for the project can be found at http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101391#comment-13101391 ] Jason Rutherglen commented on LUCENE-2312: -- There are many important use cases for immediate / zero delay index readers. I'm not sure if people realize it, but one of the major gains from this issue, is the ability to obtain a reader after every indexed document. In this case, instead of performing an array copy of the RT data structures, we will queue the changes, and then apply to the new reader. For arrays like term freqs, we will use a temp hash map of the changes made since the main array was created (when the hash map grows too large we can perform a full array copy). Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Michael Busch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, LUCENE-2312.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2066: Attachment: SOLR-2066.patch Good catch Jasper! I've updated the patch that fixes this issue. I also included highlighting + distributed grouping to the tests. Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
3.4.0 draft release notes
I just took a first crack at them: http://wiki.apache.org/solr/ReleaseNote34 http://wiki.apache.org/lucene-java/ReleaseNote34 Feel free to go improve them before we release! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances
[ https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3416. - Resolution: Fixed committed to trunk. Thanks Shay Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances -- Key: LUCENE-3416 URL: https://issues.apache.org/jira/browse/LUCENE-3416 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Shay Banon Assignee: Simon Willnauer Attachments: LUCENE-3416.patch, LUCENE-3416.patch This can come in handy when running several Lucene indices in the same VM, and wishing to rate limit merge across all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control
NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control Key: LUCENE-3425 URL: https://issues.apache.org/jira/browse/LUCENE-3425 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Shay Banon A discussion on IRC raised several improvements that can be made to NRT caching dir. Some of the problems it currently has are: 1. Not explicitly controlling the memory usage, which can result in overusing memory (for example, large new segments being committed because refreshing is too far behind). 2. Heap fragmentation because of constant allocation of (probably promoted to old gen) byte buffers. 3. Not being able to control the memory usage across indices for multi index usage within a single JVM. A suggested solution (which still needs to be ironed out) is to have a BufferAllocator that controls allocation of byte[], and allow to return unused byte[] to it. It will have a cap on the size of memory it allows to be allocated. The NRT caching dir will use the allocator, which can either be provided (for usage across several indices) or created internally. The caching dir will also create a wrapped IndexOutput, that will flush to the main dir if the allocator can no longer provide byte[] (exhausted). When a file is flushed from the cache to the main directory, it will return all the currently allocated byte[] to the BufferAllocator to be reused by other files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control
[ https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101567#comment-13101567 ] Michael McCandless commented on LUCENE-3425: Also, a quick win on trunk is to use IOCtx's FlushInfo.estimatedSegmentSize to decide up front whether to try caching or not. Ie if the to-be-flushed segment is too large we should not cache it. NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control Key: LUCENE-3425 URL: https://issues.apache.org/jira/browse/LUCENE-3425 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Shay Banon A discussion on IRC raised several improvements that can be made to NRT caching dir. Some of the problems it currently has are: 1. Not explicitly controlling the memory usage, which can result in overusing memory (for example, large new segments being committed because refreshing is too far behind). 2. Heap fragmentation because of constant allocation of (probably promoted to old gen) byte buffers. 3. Not being able to control the memory usage across indices for multi index usage within a single JVM. A suggested solution (which still needs to be ironed out) is to have a BufferAllocator that controls allocation of byte[], and allow to return unused byte[] to it. It will have a cap on the size of memory it allows to be allocated. The NRT caching dir will use the allocator, which can either be provided (for usage across several indices) or created internally. The caching dir will also create a wrapped IndexOutput, that will flush to the main dir if the allocator can no longer provide byte[] (exhausted). When a file is flushed from the cache to the main directory, it will return all the currently allocated byte[] to the BufferAllocator to be reused by other files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2752) leader-per-shard
leader-per-shard Key: SOLR-2752 URL: https://issues.apache.org/jira/browse/SOLR-2752 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Yonik Seeley Fix For: 4.0 We need to add metadata into zookeeper about who is the leader for each shard, and have some kind of leader election. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
issue SOLR-1565 - StreamingUpdateSolrServer and Javabin
Hi, I've made a alpha version of StreamingUpdateSolrServer dedicated to Binary update (javabin), It works fine for me. It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts. BinaryStreamingUpdateSolrServer.java Description: Binary data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1979: -- Fix Version/s: (was: 3.4) 3.5 Moving to 3.5 Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Labels: UpdateProcessor Fix For: 3.5 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)
[ https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2742: -- Fix Version/s: (was: 3.4) 3.5 Add commitWithin to convenience signatures for SolrServer.add(..) - Key: SOLR-2742 URL: https://issues.apache.org/jira/browse/SOLR-2742 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: SolrJ, commitWithin Fix For: 3.5, 4.0 Attachments: SOLR-2742.patch, SOLR-2742.patch, SOLR-2742.patch Today you need to manually create an UpdateRequest in order to set the commitWithin value. We should provide an optional commitWithin parameter on all SolrServer.add(..) methods as a convenience -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101612#comment-13101612 ] Lance Norskog commented on SOLR-1979: - I'm impressed! This is a lot of work and empirical testing for a difficult problem. Comments: There are a few parameters that are true/false, but in the future you might want a third answer. It might be worth making the decision via a keyword so you can add new keywords later. About the multiple languages in one field problem: you can't solve everything at once. The other document analysis components like UIMA should be able to identify parts of documents, and then you use this on one part at a time. This is the point of a modular toolkit: you combine the tools to solve advanced problems. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Labels: UpdateProcessor Fix For: 3.5 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: issue SOLR-1565 - StreamingUpdateSolrServer and Javabin
Hey Patrick- Can you make a JIRA issue, and post the code there? Thanks Ryan On Fri, Sep 9, 2011 at 5:47 PM, Patrick Sauts patrick.via...@gmail.com wrote: Hi, I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary update (javabin), It works fine for me. It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1565) StreamingUpdateSolrServer should support RequestWriter API
[ https://issues.apache.org/jira/browse/SOLR-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Sauts updated SOLR-1565: Attachment: BinaryStreamingUpdateSolrServer.java I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary update using javabin, It works fine for me. It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts StreamingUpdateSolrServer should support RequestWriter API -- Key: SOLR-1565 URL: https://issues.apache.org/jira/browse/SOLR-1565 Project: Solr Issue Type: Improvement Components: clients - java, update Affects Versions: 1.4 Reporter: Shalin Shekhar Mangar Fix For: 3.4, 4.0 Attachments: BinaryStreamingUpdateSolrServer.java StreamingUpdateSolrServer is hard-coded to write XML data. It should integrate the RequestWriter API so that it can be used to send binary update payloads. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-1565) StreamingUpdateSolrServer should support RequestWriter API
[ https://issues.apache.org/jira/browse/SOLR-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101648#comment-13101648 ] Patrick Sauts edited comment on SOLR-1565 at 9/9/11 11:36 PM: -- I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary update using javabin, It works fine for me. I have attached it to this issue BinaryStreamingUpdateSolrServer.java It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts was (Author: pathedog): I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary update using javabin, It works fine for me. It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts StreamingUpdateSolrServer should support RequestWriter API -- Key: SOLR-1565 URL: https://issues.apache.org/jira/browse/SOLR-1565 Project: Solr Issue Type: Improvement Components: clients - java, update Affects Versions: 1.4 Reporter: Shalin Shekhar Mangar Fix For: 3.4, 4.0 Attachments: BinaryStreamingUpdateSolrServer.java StreamingUpdateSolrServer is hard-coded to write XML data. It should integrate the RequestWriter API so that it can be used to send binary update payloads. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: issue SOLR-1565 - StreamingUpdateSolrServer and Javabin
I have attached the file to the issue SOLR-1565. Hopping this will help. Patrick Sauts. -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: Friday, September 09, 2011 3:57 PM To: dev@lucene.apache.org Subject: Re: issue SOLR-1565 - StreamingUpdateSolrServer and Javabin Hey Patrick- Can you make a JIRA issue, and post the code there? Thanks Ryan On Fri, Sep 9, 2011 at 5:47 PM, Patrick Sauts patrick.via...@gmail.com wrote: Hi, I've made a alpha version of StreamingUpdateSolrServer dedicated to Binary update (javabin), It works fine for me. It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org