Fwd: [Orekit Developers] Distribution of orekit-python egg

2011-09-09 Thread Petrus Hyvönen
Hi JCC developers,

As this discussion could be of potential interest of others who makes python
modules using jcc I cross post it.

I am really not experienced in the licensing issues of open source, closed
source etc.

A JCC generated external java library .egg, do contain some JCC code, is
there something I need to do to ensure that the JCC license is fulfilled?

In my case the wrapped library is also in apache, so I guess it should not
be an issue to put the egg under apache?

Regads
/Petrus

-- Forwarded message --
From: Petrus Hyvönen petrus.hyvo...@gmail.com
Date: Fri, Sep 9, 2011 at 10:33 AM
Subject: Re: [Orekit Developers] Distribution of orekit-python egg
To: orekit-develop...@orekit.org


Hi Luc,

Good to hear, I think it is good with an easy-entry to the orekit through
python.

I really don't know much about the licensing stuff.

- My work in this process is rather minimal, issuing the right command line
parameters.
- The JCC tool that I use for wrapping is under apache 2.0 license as well.
http://pypi.python.org/pypi/JCC/
- orekit and common maths library are included in the egg. (common math
assumed to be under apache as well)

Microsoft compiler has been used in the process, but as I understand this
does not affect the licensing?

Please send me the paper and I can understand it better, I am not really
clear right now what the actual content of the authorization is about.

Regards
/Petrus




On Fri, Sep 9, 2011 at 10:04 AM, MAISONOBE Luc luc.maison...@c-s.fr wrote:

 Hi Petrus,

 Petrus Hyvönen petrus.hyvo...@gmail.com a écrit :


  Hi,

 This is mostly a legal/policy question,

 Would the license of orekit allow and would it be preferred that I create
 some python eggs (python installer packages) for orekit?


 The license allow it and it would be interesting to do.



  I am using JCC to
 wrap orekit java library so it is accessible from standard python. All
 instructions needed are on the orekit wiki.

 After an update to python 2.7 and rebuilding of the wrapping, I realized
 that there is some caveats involved on at least windows operating system
 (such that you need microsoft visual C installed) to create the wrapping,
 but not to run it.  When built I can easily make the eggs that could be
 put
 on the orekit or other webpage and easily installed, without the need of
 that compiler etc.


 If you want it and if you choose to have all additional wrapping code and
 packaging distributed under the sames terms as Orekit, i.e. the Apache 2
 Software License, then yes, we could host them here in the main Orekit site.
 You would of course be credited for that. Before we can really distribute
 your work, we would ask you to sign a Individual Contributor License
 Agreement which basically say you authorize us to distribute what you did.
 We follow exactly the same process as the Apache Software Foundation for
 that. I can send you the template of the paper.

 Thanks
 Luc



 Let me know what you think
 /Petrus




 --**--**
 This message was sent using IMP, the Internet Messaging Program.





-- 
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00



-- 
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00


[jira] [Created] (SOLR-2751) TermsComponent terms.regex and terms.upper does not always work

2011-09-09 Thread Stephan Meisinger (JIRA)
TermsComponent terms.regex and terms.upper does not always work
---

 Key: SOLR-2751
 URL: https://issues.apache.org/jira/browse/SOLR-2751
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 3.3
 Environment: Solr 3.3
Reporter: Stephan Meisinger


TermComponent with a regex does check upper bound only on regexp success.

example:

terms.regex.flag=case_insensitive
terms.fl=suggest_fr
terms.limit=10
terms.regex=a.*
terms.lower=A
terms.upper=b

will also check terms starting with 'b' up to 'z'. But this wouldn't be needed. 
For this example upper is ignored. Currently checks are done:

[lower] - start loop at
[regexp] - miss: continue
[upper] - miss: break
[freq] - miss: continue

should be done:

[lower] - start loop at
[upper] - miss: break
[freq] - miss: continue (I think double compare is much faster then a std 
regexp)
[regexp] - miss: continue





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101075#comment-13101075
 ] 

Simon Willnauer commented on LUCENE-3416:
-

I am planning to commit this today, any objections?

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
Assignee: Simon Willnauer
 Attachments: LUCENE-3416.patch, LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-09 Thread Simon Willnauer
I created LUCENE-3424 for this. But I still would like to keep the
discussion open here rather than moving this entirely to an issue.
There is more about this than only the seq. ids.

simon

On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Returning a long seqID seems the least invasive change to make this
 total ordering possible?  Especially since the DWDQ already computes
 this order...

 +1
 This seems like the most powerful option.

 -Yonik
 http://www.lucene-eurocon.com - The Lucene/Solr User Conference

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3424) Return sequence ids from IW update/delete/add/commit to allow total ordering outside of IW

2011-09-09 Thread Simon Willnauer (JIRA)
Return sequence ids from IW update/delete/add/commit to allow total ordering 
outside of IW
--

 Key: LUCENE-3424
 URL: https://issues.apache.org/jira/browse/LUCENE-3424
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


Based on the discussion on the [mailing 
list|http://mail-archives.apache.org/mod_mbox/lucene-dev/201109.mbox/%3CCAAHmpki-h7LUZGCUX_rfFx=q5-YkLJei+piRG=oic8d1pnr...@mail.gmail.com%3E]
 IW should return sequence ids from update/delete/add and commit to allow 
ordering of events for consistent transaction logs and recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Checkout SearchWorkings.org - it just went live!

2011-09-09 Thread Simon Willnauer
Hey folks,

Some of you might have heard, myself and a small group of other
passionate search technology professionals have been working hard in
the last few months to launch a community site known as
SearchWorkings.org [1]. This initiative has been set up for other
search professionals to have a single point of contact or
comprehensive resource where one can learn and talk about all the
exciting new developments in the world of open source search.

Anyone like yourselves familiar with open source search knows that
technologies like Lucene and Solr have grown tremendously in
popularity over the years, but with this growth there have also come a
number of challenges, such as limited support and education. With the
launch of SearchWorkings.org we are convinced we will overcome and
resolve some of these challenges.

Covering open source search technologies from Apache Lucene and Apache
Solr to Apache Mahout, one of the key objectives for the community is
to create a place where search specialists can engage with one another
and enjoy a single point of contact for various resources, downloads
and documentation.

Like any other community website, content will be added on a regular
basis and community members can also make their own contributions and
stay on top of everything search related too. For now, there is access
to a extensive resource centre offering online tutorials, downloads,
white papers and access to a host of search specialists in the forum.
With the ability to post blog items and keep up to date with relevant
news, the site is a search specialists dream come true and addresses
what we felt was a clear need in the market.

Searchworkings.org starts off with an initial focus on Lucene, Solr 
Friends but aims to be much broader. Each of you can  should
contribute, tell us their search, data-processing, setup or
optimization story. I am looking forward to more and more blogs,
articles and tutorials about smaller projects like Apache Lucy, real
world case-studies or 3rd party extensions for OSS Search components.

have fun,

Simon

[1] http://www.searchworkings.org
[2] Trademark Acknowledgement: Apache Lucene, Apache Solr, Apache
Mahout and Apache Lucy respective logos are trademarks of The Apache
Software Foundation. All other marks mentioned may be trademarks or
registered trademarks of their respective owners.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-09 Thread Andrzej Bialecki

On 09/09/2011 11:00, Simon Willnauer wrote:

I created LUCENE-3424 for this. But I still would like to keep the
discussion open here rather than moving this entirely to an issue.
There is more about this than only the seq. ids.


I'm concerned also about the content of the transaction log. In Solr it 
uses javabin-encoded UpdateCommand-s (either SolrInputDocuments or 
Delete/Commit commands). Documents in the log are raw documents, i.e. 
before analysis.


This may have some merits for Solr (e.g. you could imagine having 
different analysis chains on the Solr slaves), but IMHO it's more of a 
hassle for Lucene, because it means that the analysis has to be repeated 
over and over again on all clients. If the analysis chain is costly 
(e.g. NLP) then it would make sense to have an option to log documents 
post-analysis, i.e. as correctly typed stored values (e.g. string - 
numeric) AND the resulting TokenStream-s. This has also the advantage of 
moving us towards the dumb IndexWriter concept, i.e. separating 
analysis from the core inverted index functionality.


So I'd argue for recording post-analysis docs in the tlog, either 
exclusively or as a default option.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Updated] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-09-09 Thread Koji Sekiguchi
Thanks for the recover Robert!

Koji Sekiguchi from mobile


On 2011/09/09, at 14:49, Robert Muir (JIRA) j...@apache.org wrote:

 
 [ 
 https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  ]
 
 Robert Muir updated LUCENE-1889:
 
 
Attachment: LUCENE-1889_reader.patch
 
 here is the patch I applied, might not be the best or whatever, and see the 
 TODO/note in the code.
 
 FastVectorHighlighter: support for additional queries
 -
 
Key: LUCENE-1889
URL: https://issues.apache.org/jira/browse/LUCENE-1889
Project: Lucene - Java
 Issue Type: Wish
 Components: modules/highlighter
   Reporter: Robert Muir
   Assignee: Koji Sekiguchi
   Priority: Minor
Fix For: 3.5, 4.0
 
Attachments: LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889.patch, 
 LUCENE-1889_reader.patch
 
 
 I am using fastvectorhighlighter for some strange languages and it is 
 working well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that 
 might help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?
 
 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-09 Thread eks dev
+1
indeed! All possibilities are are needed.

One might do wild things if it is somehow  typed. For example,
dictionary compression for fields that are tokenized (not only
stored), as we already have Term dictionary supporting ord-s. Keeping
just a map Token - ord with transaction log...




On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki a...@getopt.org wrote:
 On 09/09/2011 11:00, Simon Willnauer wrote:

 I created LUCENE-3424 for this. But I still would like to keep the
 discussion open here rather than moving this entirely to an issue.
 There is more about this than only the seq. ids.

 I'm concerned also about the content of the transaction log. In Solr it uses
 javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit
 commands). Documents in the log are raw documents, i.e. before analysis.

 This may have some merits for Solr (e.g. you could imagine having different
 analysis chains on the Solr slaves), but IMHO it's more of a hassle for
 Lucene, because it means that the analysis has to be repeated over and over
 again on all clients. If the analysis chain is costly (e.g. NLP) then it
 would make sense to have an option to log documents post-analysis, i.e. as
 correctly typed stored values (e.g. string - numeric) AND the resulting
 TokenStream-s. This has also the advantage of moving us towards the dumb
 IndexWriter concept, i.e. separating analysis from the core inverted index
 functionality.

 So I'd argue for recording post-analysis docs in the tlog, either
 exclusively or as a default option.

 --
 Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java

2011-09-09 Thread swiss knife
Hello,

 Sorry for the newbie question, I just checked out Solr using maven and I get 
the following compile error on Eclipse:

 The method setXIncludeAware(boolean) is undefined for the type 
DocumentBuilderFactory Config.java

 /solr/core/src/java/org/apache/solr/core line 113 Java Problem

 Am I missing components ? where do you guys go to checks in case of similar 
problems ?

 Thank you.


Re: Regarding Transaction logging

2011-09-09 Thread Andrzej Bialecki

On 09/09/2011 12:07, eks dev wrote:

+1
indeed! All possibilities are are needed.

One might do wild things if it is somehow  typed. For example,
dictionary compression for fields that are tokenized (not only
stored), as we already have Term dictionary supporting ord-s. Keeping
just a map Token-  ord with transaction log...


Hmm, you mean a per-doc map? because a global map would have to be 
updated as we add new docs, which would make the writing process 
non-atomic, which is the last thing you want from a transaction log :)


As a per-doc compression, sure. In fact, what you describe is 
essentially a single doc mini-index, because the map is a term dict, the 
token streams with ords are postings, etc.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-09 Thread Jasper van Veghel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101113#comment-13101113
 ] 

Jasper van Veghel commented on SOLR-2066:
-

Great - ngroups is coming through now! Another thing I noticed is that 
highlighting doesn't work on account of the resultIds not getting set in the 
ResponseBuilder. It only happens in combination with _distributed_ grouping - 
so it works when I do this:

http://localhost:8983/solr/foo/select?wt=jsonrows=2group=truegroup.field=dcterms_sourcegroup.ngroups=truehl=true

Or this:

http://localhost:8983/solr/foo/select?wt=jsonrows=2shards=localhost:8983/solr/foo,localhost:8983/solr/barhl=true

But not this:

http://localhost:8983/solr/foo/select?wt=jsonrows=2group=truegroup.field=dcterms_sourcegroup.ngroups=trueshards=localhost:8983/solr/foo,localhost:8983/solr/barhl=true

Stacktrace:

{code}SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:156)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1407)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:680){code}

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



(SOLR-2726) NullPointerException when using spellcheck.q

2011-09-09 Thread Bernd Fehling

Well, the patch is attached to the issue and problem is fixed.
But how to commit it to svn so that it is fixed with the next build?

And how to set Status and Resolution for this issue?
Any idea?

So I'm really trying to help, but if I fix something and can't get
it to svn so that it is fixed in later versions I have to do a fork
and maintain my own version with further fixes and patches.

Bernd

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-09 Thread eks dev
I didn't think, it was just a spontaneous reaction :)

At the moment I am using static dictionaries to at least get a grip on
size of stored fields (escaping encoded terms)

Re: Global
Maybe the trick would be to somehow use term dictionary as it  must be
*eventually* updated? An idea is to write raw token stream for
atomicity and reduce it later in compaction phase (e.g on lucene
commit())... no matter what we plan do, TL compaction  is going to be
needed?

It is slightly moving target problem (TL chases term dictionary),
but I am sure, benefits can be huge. compacted TL entry would need to
have a pointer to Term[] used to encode it, but this is by all means
doable, just simple Term[].

It surely makes not much sense for high cardinality fields, but if you
have something with low cardinality (indexed and stored) on a big
(100Mio) collection, this reduces space by exorbitant amounts.


I do not know, just trying to build upon the fact that we have term
dictionary updated in any case---


This  works not only for transaction logging, but also for
(Analyzed)-{Stored , indexed} fields. By the way, I never look how
our term vectors work, keeping reference to token or verbatim term
copy?





On Fri, Sep 9, 2011 at 12:31 PM, Andrzej Bialecki a...@getopt.org wrote:
 On 09/09/2011 12:07, eks dev wrote:

 +1
 indeed! All possibilities are are needed.

 One might do wild things if it is somehow  typed. For example,
 dictionary compression for fields that are tokenized (not only
 stored), as we already have Term dictionary supporting ord-s. Keeping
 just a map Token-  ord with transaction log...

 Hmm, you mean a per-doc map? because a global map would have to be updated
 as we add new docs, which would make the writing process non-atomic, which
 is the last thing you want from a transaction log :)

 As a per-doc compression, sure. In fact, what you describe is essentially a
 single doc mini-index, because the map is a term dict, the token streams
 with ords are postings, etc.

 --
 Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2011-09-09 Thread Mike Sokolov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Sokolov updated LUCENE-1889:
-

Attachment: LUCENE-1889-solr.patch

Sorry, forgot to include changes to DefaultSolrHighlighter as well (it gets 
confusing maintaining multiple patches in the same build).

I do think the non-reader method should be derprecated as in Robert's comment.

 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-1889-solr.patch, LUCENE-1889.patch, 
 LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889_reader.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-09 Thread Andrzej Bialecki

On 09/09/2011 13:20, eks dev wrote:

I didn't think, it was just a spontaneous reaction :)

At the moment I am using static dictionaries to at least get a grip on
size of stored fields (escaping encoded terms)

Re: Global
Maybe the trick would be to somehow use term dictionary as it  must be
*eventually* updated? An idea is to write raw token stream for
atomicity and reduce it later in compaction phase (e.g on lucene
commit())... no matter what we plan do, TL compaction  is going to be
needed?


Compaction - not sure, it would have to preserve the ordering of ops. 
But some form of primitive compression - certainly, delta coding, vints, 
etc, anything that can be done per doc, without the need to use data 
that spans more than 1 record.




It is slightly moving target problem (TL chases term dictionary),
but I am sure, benefits can be huge. compacted TL entry would need to
have a pointer to Term[] used to encode it, but this is by all means
doable, just simple Term[].

It surely makes not much sense for high cardinality fields, but if you
have something with low cardinality (indexed and stored) on a big
(100Mio) collection, this reduces space by exorbitant amounts.


I do not know, just trying to build upon the fact that we have term
dictionary updated in any case---


If the tlog has a Commit op, then you could theoretically compact all 
preceding entries ... at least their term dicts. If you compacted the 
postings, too, then you would essentially have a multi-doc index (naked 
segment), but it would not be a transaction log anymore, because the 
update ordering wouldn't be preserved (e.g. intermediate Delete ops 
would have a different effect).





This  works not only for transaction logging, but also for
(Analyzed)-{Stored , indexed} fields. By the way, I never look how
our term vectors work, keeping reference to token or verbatim term
copy?


It's like term dict + postings, terms are delta front coded like the 
main term dictionary. It does not reuse terms from the main dict, I 
think this representation was chosen to avoid ord renumbering when the 
main term dict is updated - you would have to renumber all term vectors 
on each commit...


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java

2011-09-09 Thread Erick Erickson
DocumentBuilderFactory is a Java class, not a Solr class, what version
of Java are you trying to compile against?

Best
Erick

On Fri, Sep 9, 2011 at 5:53 AM, swiss knife swiss_kn...@email.com wrote:

 Hello,

 Sorry for the newbie question, I just checked out Solr using maven and I get
 the following compile error on Eclipse:

 The method setXIncludeAware(boolean) is undefined for the type
 DocumentBuilderFactory    Config.java

     /solr/core/src/java/org/apache/solr/core    line 113    Java Problem

 Am I missing components ? where do you guys go to checks in case of similar
 problems ?

 Thank you.



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java

2011-09-09 Thread Andrzej Bialecki

On 09/09/2011 14:08, Erick Erickson wrote:

DocumentBuilderFactory is a Java class, not a Solr class, what version
of Java are you trying to compile against?

Best
Erick

On Fri, Sep 9, 2011 at 5:53 AM, swiss knifeswiss_kn...@email.com  wrote:


Hello,

Sorry for the newbie question, I just checked out Solr using maven and I get
the following compile error on Eclipse:

The method setXIncludeAware(boolean) is undefined for the type
DocumentBuilderFactoryConfig.java

 /solr/core/src/java/org/apache/solr/coreline 113Java Problem

Am I missing components ? where do you guys go to checks in case of similar
problems ?


I had a similar problem in another project, it turned out to be a 
conflicting version of the XML API-s in the JRE and a particular version 
of Xerces. AFAIK this issue has been resolved in the recent versions of 
Xerces. If you can't upgrade Xerces, try swapping the order of classpath 
entries, so that one (or the other, can't remember which one...) API 
takes precedence.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2011-09-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101152#comment-13101152
 ] 

Jan Høydahl commented on SOLR-2366:
---

Hoss: Good comments, which need to be decided upon, including corner cases.

1)
bq.I would suggest define any case where the spec contains absolute value N 
after (effective) value M where N  M as an error and fail fast.
Agree

bq.Still not sure what (if anything) should be done about overlapping ranges 
that appear out of order (ie: 0,100,50..90,150 ... is that 0-100,50-90,90-150 
?)
If all gaps are specified as explicit ranges this is no ambiguity, so we could 
require all gaps to be explicit ranges if one wants to use it?

2) 

bq.The first three examples suggest that * will be treated as -Infinity and 
+Infinity based on position (ie: the first and last ranges will be unbounded 
on one end) but in the last example the wording ...100-200, 200-300 repeating 
until max seems inconsistent with that.
Agree. The 0,10,50,+50,+100,* example would create infinite gaps which would be 
less than desireable. But 0,10,50,+50,+100,500 would give repeating 100-gaps 
until upper bound 500, while 0,10,50,+50,+100,500,* would in addition give a 
last range 500-*. That was the intentional syntax.

bq.If we want to support the idea of repeat the last increment continuously 
that should be with it's own repeat syntax such as the ... (three dots) i 
suggested in comment 17/Feb/11 23:50 above. I would argue that this should 
only be legal after an increment and before a concrete value (ie: 
0,+10,...,100). Requiring it to follow an increment seems like a given 
(otherwise what exactly are you repeating?) requiring that it be followed by an 
absolute value is based on my concern that if it's the last item in the spec 
(or the last item before *) it results in an infinite number of ranges.

Agree. Alternatively, if Solr could compute myField.max(), the useful value of 
* could be computed a bit smarter, but that would probably be hard to scale 
in a multi-shard setting.

bq.That seems like it isn't specific enough about what is/isn't going to be 
allowed – particularly since all of the facet.range params can be specified on 
a per field basis.

Didn't really think much about the global params. Silently not caring about 
gap, begin, end, other would be one way to go, but then the error feedback is 
not explicit in case of misunderstanding; the user will see that he does not 
get back what he thought, and start reading the documentation :)

I have no good answer to this, other than inventing some syntax. The default 
could be that facet.range.spec respects the global values for start and end, 
but also allow explicitly overriding start and end values as part of spec with 
a special syntax.
The following params would result in ranges 0-1, 1-2, 2-3, 3-5, 5-10 :
{noformat}
facet.range.start=0
facet.range.end=10
facet.range.gap=2
f.bedrooms.facet.range.spec=1,2,3,5
{noformat}

But these params would result in the same ranges because we specify start and 
end with a special syntax N.. for start and ..M for end:
{noformat}
facet.range.start=100
facet.range.end=200
facet.range.gap=10
f.bedrooms.facet.range.spec=0..,1,2,3,5,..10
{noformat}

This would be equivalent with adding the two params 
f.bedrooms.facet.range.start=0f.bedrooms.facet.range.end=10, which could then 
still be allowed as an alternative. If the first value of the spec is not an 
N.., we'll require a facet.range.start. If the last value of the spec is not 
..M, we'll require facet.range.end.

Also, it must not be allowed to specify both a global facet.range.gap and a 
global facet.range.spec.

Would this be a good compromise? :-) My primary reason for suggesting this is 
to give users a terse, intuitive syntax for ranges.

4)
bq.Should all ranges produced by facet.range.spec be considered gap ranges? 
even the ones with no lower/upper bound?
Good question. I think the values facet.range.include=upper/lower is clear. 
Outer/edge would need some more work/definition.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each 

[jira] [Commented] (SOLR-2366) Facet Range Gaps

2011-09-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101185#comment-13101185
 ] 

Jan Høydahl commented on SOLR-2366:
---

I've given the Wiki page another take, with the new proposed start/end syntax 
and added an example or two. The mutually exclusive sentence now boils down 
to facet.range.gap/facet.range.spec being mutually exclusive (one the same 
field). Have a look at 
http://wiki.apache.org/solr/VariableRangeGaps#facet.range.spec

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.  I'd propose the syntax 
 to be a comma separated list of sizes for each bucket.  If only one value is 
 specified, then it behaves as it currently does.  Otherwise, it creates the 
 different size buckets.  If the number of buckets doesn't evenly divide up 
 the space, then the size of the last bucket specified is used to fill out the 
 remaining space (not sure on this)
 For instance,
 facet.range.start=0
 facet.range.end=400
 facet.range.gap=5,25,50,100
 would yield buckets of:
 0-5,5-30,30-80,80-180,180-280,280-380,380-400

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2366) Facet Range Gaps

2011-09-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2366:
--

Description: 
There really is no reason why the range gap for date and numeric faceting needs 
to be evenly spaced.  For instance, if and when SOLR-1581 is completed and one 
were doing spatial distance calculations, one could facet by function into 3 
different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) 
and everything else (150KM+), for instance.  We should be able to quantize the 
results into arbitrarily sized buckets.

(Original syntax proposal removed, see discussion for concrete syntax)


  was:
There really is no reason why the range gap for date and numeric faceting needs 
to be evenly spaced.  For instance, if and when SOLR-1581 is completed and one 
were doing spatial distance calculations, one could facet by function into 3 
different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) 
and everything else (150KM+), for instance.  We should be able to quantize the 
results into arbitrarily sized buckets.  I'd propose the syntax to be a comma 
separated list of sizes for each bucket.  If only one value is specified, then 
it behaves as it currently does.  Otherwise, it creates the different size 
buckets.  If the number of buckets doesn't evenly divide up the space, then the 
size of the last bucket specified is used to fill out the remaining space (not 
sure on this)
For instance,
facet.range.start=0
facet.range.end=400
facet.range.gap=5,25,50,100

would yield buckets of:
0-5,5-30,30-80,80-180,180-280,280-380,380-400




Here's Grant's original syntax proposal which is removed from issue description 
to avoid confusion:
{quote}

I'd propose the syntax to be a comma separated list of sizes for each bucket.  
If only one value is specified, then it behaves as it currently does.  
Otherwise, it creates the different size buckets.  If the number of buckets 
doesn't evenly divide up the space, then the size of the last bucket specified 
is used to fill out the remaining space (not sure on this)
For instance,
facet.range.start=0
facet.range.end=400
facet.range.gap=5,25,50,100

would yield buckets of:
0-5,5-30,30-80,80-180,180-280,280-380,380-400
{quote}

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2011-09-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101204#comment-13101204
 ] 

Jan Høydahl commented on SOLR-2366:
---

One thing this improvement needs to tackle is how to return the range buckets 
in the Response. It will not be enough with the simple range_facet format
{code:xml}
lst name=facet_ranges
  lst name=url_length
lst name=counts
  int name=421/int
  int name=451/int
  int name=511/int
  int name=661/int
/lst
int name=gap3/int
int name=start0/int
int name=end102/int
  /lst
/lst
{code}

We need something which can return the explicit ranges, similar to what 
facet_queries has. This format can then be used for the old plain gap format as 
well.

{code:xml}
lst name=facet_ranges
  lst name=url_length
lst name=counts
  int name=[42 TO 45}1/int
  int name=[45 TO 48}1/int
  int name=[51 TO 54}1/int
  int name=[66 TO 69}1/int
/lst
int name=gap3/int
int name=start0/int
int name=end102/int
  /lst
  lst name=bedrooms
lst name=counts
  int name=[1 TO *]12/int
  int name=[2 TO *]31/int
  int name=[3 TO *]26/int
  int name=[4 TO *]9/int
/lst
int name=spec1..*,2..*,3..*,4..*/int
int name=includeall/int
  /lst
/lst
{code}


 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Matching String + Matching String positions / Lucene 2.3.1

2011-09-09 Thread Jérôme

Hi,

I'm currently working with Lucene 2.3.1.

The aim of my application is to querying(from an HTML form) my lucene 
index, then displaying results (in a web page)with:

 - Matching Strings
 - Matching String positions / offsets

For example:
-The document:
when i was a child, i was a Jedi.

-The query:
w*

I would like to obtain something like:
 - 3 results
 - matching strings: when, was, was
 -offsets: 0-3; 7-9; 22-24

I've tried to use the TermPositionVector class:
[...]
Hits hits = searcher.search(query);
for (int i = 0; i  hitCount; i++) {
TermPositionVector tfv = (TermPositionVector) 
searcher.getIndexReader().getTermFreqVector(hits.id(i), content);

[...]


I will be able to get positions with this vector, but at first, I have 
to get the matching strings (when, war for the w* query).


According to you, what is the easier way ?

Thank you


Jérôme


Re: setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java

2011-09-09 Thread swiss knife
Thank you?

 Actually javax.xml.parsers.DocumentBuilderFactory.class in included in the 
JRE1.6 and also in xml-apis-1.0.b2.jar where the versions of 
DocumentBuilderFactory differ.

 I removed the dependency on xml-apis-1.0.b2.jar and it went through.

 What am I supposed to do from here ? Enter a JIRA, see what need to be done to 
commit the change ?



 On 09/09/2011 14:08, Erick Erickson wrote:  DocumentBuilderFactory is a Java 
class, not a Solr class, what version  of Java are you trying to compile 
against?   Best  Erick   On Fri, Sep 9, 2011 at 5:53 AM, swiss 
knifeswiss_kn...@email.com wrote:   Hello,   Sorry for the newbie 
question, I just checked out Solr using maven and I get  the following 
compile error on Eclipse:   The method setXIncludeAware(boolean) is 
undefined for the type  DocumentBuilderFactory Config.java   
/solr/core/src/java/org/apache/solr/core line 113 Java Problem   Am I 
missing components ? where do you guys go to checks in case of similar  
problems ? I had a similar problem in another project, it turned out to be a 
conflicting version of the XML API-s in the JRE and a particular version of 
Xerces. AFAIK this issue has been resolved in the recent versions of Xerces. If 
you can't upgrade Xerces, try swapping the order of classpath entries, so that 
one (or the other, ca
 n't remember which one...) API takes precedence. -- Best regards, Andrzej 
Bialecki  ___. ___ ___ ___ _ _ __ [__ || 
__|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded 
Unix, System Integration http://www.sigram.com Contact: info at sigram dot com 
- To 
unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, 
e-mail: dev-h...@lucene.apache.org


[Lucene.Net] How to add document to more than one index (but only analyze once)?

2011-09-09 Thread Robert Stewart
Is it possible to add a document to more than one index at the same time, such 
that document fields are only analyzed one time?  For instance, to add document 
to both a master index, and a smaller near real-time index.  I would like to 
avoid analyzing document fields more than once but I dont see if that is 
possible at all using Lucene API.

Thanks,
Bob

[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor

2011-09-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2750:
--

Attachment: SOLR-2750.patch

This patch attempts to fix this for trunk

 Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated 
 update.processor
 

 Key: SOLR-2750
 URL: https://issues.apache.org/jira/browse/SOLR-2750
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Attachments: SOLR-2750.patch


 CoreAdminHandler#handleMergeAction
 DataImportHandler#handleRequestBody

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 414 - Failure

2011-09-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/414/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:217)




Build Log (for compile errors):
[...truncated 10911 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor

2011-09-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2750:
--

Attachment: SOLR-2750-branch_3x.patch

Patch for branch_3x

 Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated 
 update.processor
 

 Key: SOLR-2750
 URL: https://issues.apache.org/jira/browse/SOLR-2750
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Attachments: SOLR-2750-branch_3x.patch, SOLR-2750.patch


 CoreAdminHandler#handleMergeAction
 DataImportHandler#handleRequestBody

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated update.processor

2011-09-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2750:
--

Fix Version/s: 4.0
   3.4

 Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated 
 update.processor
 

 Key: SOLR-2750
 URL: https://issues.apache.org/jira/browse/SOLR-2750
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
 Fix For: 3.4, 4.0

 Attachments: SOLR-2750-branch_3x.patch, SOLR-2750.patch


 CoreAdminHandler#handleMergeAction
 DataImportHandler#handleRequestBody

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #236: POMs out of sync

2011-09-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/236/

1 tests failed.
REGRESSION:  org.apache.solr.search.TestRealTimeGet.testStressGetRealtime

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:689)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:146)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at $Proxy0.invoke(Unknown Source)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
Caused by: java.lang.AssertionError: Some threads threw uncaught exceptions!
at org.junit.Assert.fail(Assert.java:91)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:717)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:661)
... 31 more




Build Log (for compile errors):
[...truncated 23347 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] How to add document to more than one index (but only analyze once)?

2011-09-09 Thread digy digy
How about indexing the new document(s) in memory using RAMDirectory then
calling indexWriter.AddIndexesNoOptimize for NRT  master index?

DIGY

On Fri, Sep 9, 2011 at 5:33 PM, Robert Stewart robert_stew...@epam.comwrote:

 Is it possible to add a document to more than one index at the same time,
 such that document fields are only analyzed one time?  For instance, to add
 document to both a master index, and a smaller near real-time index.  I
 would like to avoid analyzing document fields more than once but I dont see
 if that is possible at all using Lucene API.

 Thanks,
 Bob


Re: [Lucene.Net] How to add document to more than one index (but only analyze once)?

2011-09-09 Thread Robert Stewart
That sounds like a good plan.  How will that affect existing merge scheduling?  
For master index I use merge factor of 2.


On Sep 9, 2011, at 11:44 AM, digy digy wrote:

 How about indexing the new document(s) in memory using RAMDirectory then
 calling indexWriter.AddIndexesNoOptimize for NRT  master index?
 
 DIGY
 
 On Fri, Sep 9, 2011 at 5:33 PM, Robert Stewart robert_stew...@epam.comwrote:
 
 Is it possible to add a document to more than one index at the same time,
 such that document fields are only analyzed one time?  For instance, to add
 document to both a master index, and a smaller near real-time index.  I
 would like to avoid analyzing document fields more than once but I dont see
 if that is possible at all using Lucene API.
 
 Thanks,
 Bob



[VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-09 Thread Michael McCandless
Please vote to release the RC1 artifacts at:

  
https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

as Lucene 3.4.0 and Solr 3.4.0.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-09 Thread Simon Willnauer
On Fri, Sep 9, 2011 at 11:19 AM, Andrzej Bialecki a...@getopt.org wrote:
 On 09/09/2011 11:00, Simon Willnauer wrote:

 I created LUCENE-3424 for this. But I still would like to keep the
 discussion open here rather than moving this entirely to an issue.
 There is more about this than only the seq. ids.

 I'm concerned also about the content of the transaction log. In Solr it uses
 javabin-encoded UpdateCommand-s (either SolrInputDocuments or Delete/Commit
 commands). Documents in the log are raw documents, i.e. before analysis.

 This may have some merits for Solr (e.g. you could imagine having different
 analysis chains on the Solr slaves), but IMHO it's more of a hassle for
 Lucene, because it means that the analysis has to be repeated over and over
 again on all clients. If the analysis chain is costly (e.g. NLP) then it
 would make sense to have an option to log documents post-analysis, i.e. as
 correctly typed stored values (e.g. string - numeric) AND the resulting
 TokenStream-s. This has also the advantage of moving us towards the dumb
 IndexWriter concept, i.e. separating analysis from the core inverted index
 functionality.

 So I'd argue for recording post-analysis docs in the tlog, either
 exclusively or as a default option.

I am not sure if this should be the default option but I would need to
see how this is implemented. if we can efficiently support such a
preanalyzed document I am all for it. But I think it should be
possible to write opaque documents too. Other implementations / users
of lucene should be able to write their app specific format too.

simon

 --
 Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Fwd: [Orekit Developers] Distribution of orekit-python egg

2011-09-09 Thread Andi Vajda


On Fri, 9 Sep 2011, Petrus Hyvönen wrote:


As this discussion could be of potential interest of others who makes python
modules using jcc I cross post it.

I am really not experienced in the licensing issues of open source, closed
source etc.

A JCC generated external java library .egg, do contain some JCC code, is
there something I need to do to ensure that the JCC license is fulfilled?

In my case the wrapped library is also in apache, so I guess it should not
be an issue to put the egg under apache?


As the licenses are the same can't imagine an issue here.

In the LICENSE file - or is that the NOTICE file - of your distribution, you 
should include a copy of the Apache2 license for JCC.


That being said, I am not a lawyer...

Giving attribution to Apache JCC is always appreciated, though.

Thanks !

Andi..



Regads
/Petrus

-- Forwarded message --
From: Petrus Hyvönen petrus.hyvo...@gmail.com
Date: Fri, Sep 9, 2011 at 10:33 AM
Subject: Re: [Orekit Developers] Distribution of orekit-python egg
To: orekit-develop...@orekit.org


Hi Luc,

Good to hear, I think it is good with an easy-entry to the orekit through
python.

I really don't know much about the licensing stuff.

- My work in this process is rather minimal, issuing the right command line
parameters.
- The JCC tool that I use for wrapping is under apache 2.0 license as well.
http://pypi.python.org/pypi/JCC/
- orekit and common maths library are included in the egg. (common math
assumed to be under apache as well)

Microsoft compiler has been used in the process, but as I understand this
does not affect the licensing?

Please send me the paper and I can understand it better, I am not really
clear right now what the actual content of the authorization is about.

Regards
/Petrus




On Fri, Sep 9, 2011 at 10:04 AM, MAISONOBE Luc luc.maison...@c-s.fr wrote:


Hi Petrus,

Petrus Hyvönen petrus.hyvo...@gmail.com a écrit :


 Hi,


This is mostly a legal/policy question,

Would the license of orekit allow and would it be preferred that I create
some python eggs (python installer packages) for orekit?



The license allow it and it would be interesting to do.



 I am using JCC to

wrap orekit java library so it is accessible from standard python. All
instructions needed are on the orekit wiki.

After an update to python 2.7 and rebuilding of the wrapping, I realized
that there is some caveats involved on at least windows operating system
(such that you need microsoft visual C installed) to create the wrapping,
but not to run it.  When built I can easily make the eggs that could be
put
on the orekit or other webpage and easily installed, without the need of
that compiler etc.



If you want it and if you choose to have all additional wrapping code and
packaging distributed under the sames terms as Orekit, i.e. the Apache 2
Software License, then yes, we could host them here in the main Orekit site.
You would of course be credited for that. Before we can really distribute
your work, we would ask you to sign a Individual Contributor License
Agreement which basically say you authorize us to distribute what you did.
We follow exactly the same process as the Apache Software Foundation for
that. I can send you the template of the paper.

Thanks
Luc




Let me know what you think
/Petrus





--**--**
This message was sent using IMP, the Internet Messaging Program.






--
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00



--
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00


[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-09-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2959:


Attachment: LUCENE-2959.patch

attached is a diff (using dev-tools/scripts/diff-sources.py) between the branch 
and trunk.

I think this is ready.

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/query/scoring, general/javadocs, modules/examples
Reporter: David Mark Nemeskey
Assignee: Robert Muir
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: flexscoring branch

 Attachments: LUCENE-2959.patch, LUCENE-2959_mockdfr.patch, 
 LUCENE-2959_nocommits.patch, implementation_plan.pdf, proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.
 The wiki page for the project can be found at 
 http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-09-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2959:


Attachment: LUCENE-2959.patch

oops, i had some stuff in solr/example/work that caused a lot of noise in the 
patch.

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/query/scoring, general/javadocs, modules/examples
Reporter: David Mark Nemeskey
Assignee: Robert Muir
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: flexscoring branch, 4.0

 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, 
 LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, 
 implementation_plan.pdf, proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.
 The wiki page for the project can be found at 
 http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-09-09 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2959:


Fix Version/s: 4.0

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/query/scoring, general/javadocs, modules/examples
Reporter: David Mark Nemeskey
Assignee: Robert Muir
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: flexscoring branch, 4.0

 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, 
 LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, 
 implementation_plan.pdf, proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.
 The wiki page for the project can be found at 
 http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-09-09 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101391#comment-13101391
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

There are many important use cases for immediate / zero delay index readers.

I'm not sure if people realize it, but one of the major gains from this issue, 
is the ability to obtain a reader after every indexed document.  

In this case, instead of performing an array copy of the RT data structures, we 
will queue the changes, and then apply to the new reader.  For arrays like term 
freqs, we will use a temp hash map of the changes made since the main array was 
created (when the hash map grows too large we can perform a full array copy).



 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
 LUCENE-2312.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search

2011-09-09 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: SOLR-2066.patch

Good catch Jasper! I've updated the patch that fixes this issue. I also 
included highlighting + distributed grouping to the tests.

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



3.4.0 draft release notes

2011-09-09 Thread Michael McCandless
I just took a first crack at them:

http://wiki.apache.org/solr/ReleaseNote34
http://wiki.apache.org/lucene-java/ReleaseNote34

Feel free to go improve them before we release!

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3416) Allow to pass an instance of RateLimiter to FSDirectory allowing to rate limit merge IO across several directories / instances

2011-09-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3416.
-

Resolution: Fixed

committed to trunk. Thanks Shay

 Allow to pass an instance of RateLimiter to FSDirectory allowing to rate 
 limit merge IO across several directories / instances
 --

 Key: LUCENE-3416
 URL: https://issues.apache.org/jira/browse/LUCENE-3416
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Shay Banon
Assignee: Simon Willnauer
 Attachments: LUCENE-3416.patch, LUCENE-3416.patch


 This can come in handy when running several Lucene indices in the same VM, 
 and wishing to rate limit merge across all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control

2011-09-09 Thread Shay Banon (JIRA)
NRT Caching Dir to allow for exact memory usage, better buffer allocation and 
global cross indices control


 Key: LUCENE-3425
 URL: https://issues.apache.org/jira/browse/LUCENE-3425
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Shay Banon


A discussion on IRC raised several improvements that can be made to NRT caching 
dir. Some of the problems it currently has are:

1. Not explicitly controlling the memory usage, which can result in overusing 
memory (for example, large new segments being committed because refreshing is 
too far behind).
2. Heap fragmentation because of constant allocation of (probably promoted to 
old gen) byte buffers.
3. Not being able to control the memory usage across indices for multi index 
usage within a single JVM.

A suggested solution (which still needs to be ironed out) is to have a 
BufferAllocator that controls allocation of byte[], and allow to return unused 
byte[] to it. It will have a cap on the size of memory it allows to be 
allocated.

The NRT caching dir will use the allocator, which can either be provided (for 
usage across several indices) or created internally. The caching dir will also 
create a wrapped IndexOutput, that will flush to the main dir if the allocator 
can no longer provide byte[] (exhausted).

When a file is flushed from the cache to the main directory, it will return 
all the currently allocated byte[] to the BufferAllocator to be reused by other 
files.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control

2011-09-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101567#comment-13101567
 ] 

Michael McCandless commented on LUCENE-3425:


Also, a quick win on trunk is to use IOCtx's FlushInfo.estimatedSegmentSize to 
decide up front whether to try caching or not.

Ie if the to-be-flushed segment is too large we should not cache it.


 NRT Caching Dir to allow for exact memory usage, better buffer allocation and 
 global cross indices control
 

 Key: LUCENE-3425
 URL: https://issues.apache.org/jira/browse/LUCENE-3425
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Shay Banon

 A discussion on IRC raised several improvements that can be made to NRT 
 caching dir. Some of the problems it currently has are:
 1. Not explicitly controlling the memory usage, which can result in overusing 
 memory (for example, large new segments being committed because refreshing is 
 too far behind).
 2. Heap fragmentation because of constant allocation of (probably promoted to 
 old gen) byte buffers.
 3. Not being able to control the memory usage across indices for multi index 
 usage within a single JVM.
 A suggested solution (which still needs to be ironed out) is to have a 
 BufferAllocator that controls allocation of byte[], and allow to return 
 unused byte[] to it. It will have a cap on the size of memory it allows to be 
 allocated.
 The NRT caching dir will use the allocator, which can either be provided (for 
 usage across several indices) or created internally. The caching dir will 
 also create a wrapped IndexOutput, that will flush to the main dir if the 
 allocator can no longer provide byte[] (exhausted).
 When a file is flushed from the cache to the main directory, it will return 
 all the currently allocated byte[] to the BufferAllocator to be reused by 
 other files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2752) leader-per-shard

2011-09-09 Thread Yonik Seeley (JIRA)
leader-per-shard


 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
 Fix For: 4.0


We need to add metadata into zookeeper about who is the leader for each shard, 
and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



issue SOLR-1565 - StreamingUpdateSolrServer and Javabin

2011-09-09 Thread Patrick Sauts
Hi,

 

I've made a alpha version of StreamingUpdateSolrServer dedicated to Binary
update (javabin), It works fine for me.

 

It is not a fix of the issue SOLR-1565, it is a new class.

But I think It can maybe be useful to fix the bug.

 

If somebody tests it thank you to send feedback.

 

Patrick Sauts.



BinaryStreamingUpdateSolrServer.java
Description: Binary data

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-09-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Fix Version/s: (was: 3.4)
   3.5

Moving to 3.5

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.5

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)

2011-09-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2742:
--

Fix Version/s: (was: 3.4)
   3.5

 Add commitWithin to convenience signatures for SolrServer.add(..)
 -

 Key: SOLR-2742
 URL: https://issues.apache.org/jira/browse/SOLR-2742
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: SolrJ, commitWithin
 Fix For: 3.5, 4.0

 Attachments: SOLR-2742.patch, SOLR-2742.patch, SOLR-2742.patch


 Today you need to manually create an UpdateRequest in order to set the 
 commitWithin value.
 We should provide an optional commitWithin parameter on all 
 SolrServer.add(..) methods as a convenience

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-09-09 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101612#comment-13101612
 ] 

Lance Norskog commented on SOLR-1979:
-

I'm impressed! This is a lot of work and empirical testing for a difficult 
problem.

Comments:
There are a few parameters that are true/false, but in the future you might 
want a third answer. It might be worth making the decision via a keyword so you 
can add new keywords later.

About the multiple languages in one field problem: you can't solve everything 
at once. The other document analysis components like UIMA should be able to 
identify parts of documents, and then you use this on one part at a time. This 
is the point of a modular toolkit: you combine the tools to solve advanced 
problems.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.5

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: issue SOLR-1565 - StreamingUpdateSolrServer and Javabin

2011-09-09 Thread Ryan McKinley
Hey Patrick-

Can you make a JIRA issue, and post the code there?

Thanks
Ryan


On Fri, Sep 9, 2011 at 5:47 PM, Patrick Sauts patrick.via...@gmail.com wrote:
 Hi,



 I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary
 update (javabin), It works fine for me.



 It is not a fix of the issue SOLR-1565, it is a new class.

 But I think It can maybe be useful to fix the bug.



 If somebody tests it thank you to send feedback.



 Patrick Sauts.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1565) StreamingUpdateSolrServer should support RequestWriter API

2011-09-09 Thread Patrick Sauts (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Sauts updated SOLR-1565:


Attachment: BinaryStreamingUpdateSolrServer.java

I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary 
update using javabin, It works fine for me.

It is not a fix of the issue SOLR-1565, it is a new class.
But I think It can maybe be useful to fix the bug.

If somebody tests it thank you to send feedback.

Patrick Sauts

 StreamingUpdateSolrServer should support RequestWriter API
 --

 Key: SOLR-1565
 URL: https://issues.apache.org/jira/browse/SOLR-1565
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, update
Affects Versions: 1.4
Reporter: Shalin Shekhar Mangar
 Fix For: 3.4, 4.0

 Attachments: BinaryStreamingUpdateSolrServer.java


 StreamingUpdateSolrServer is hard-coded to write XML data. It should 
 integrate the RequestWriter API so that it can be used to send binary update 
 payloads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-1565) StreamingUpdateSolrServer should support RequestWriter API

2011-09-09 Thread Patrick Sauts (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101648#comment-13101648
 ] 

Patrick Sauts edited comment on SOLR-1565 at 9/9/11 11:36 PM:
--

I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary 
update using javabin, It works fine for me.
I have attached it to this issue BinaryStreamingUpdateSolrServer.java

It is not a fix of the issue SOLR-1565, it is a new class.
But I think It can maybe be useful to fix the bug.

If somebody tests it thank you to send feedback.

Patrick Sauts

  was (Author: pathedog):
I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary 
update using javabin, It works fine for me.

It is not a fix of the issue SOLR-1565, it is a new class.
But I think It can maybe be useful to fix the bug.

If somebody tests it thank you to send feedback.

Patrick Sauts
  
 StreamingUpdateSolrServer should support RequestWriter API
 --

 Key: SOLR-1565
 URL: https://issues.apache.org/jira/browse/SOLR-1565
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, update
Affects Versions: 1.4
Reporter: Shalin Shekhar Mangar
 Fix For: 3.4, 4.0

 Attachments: BinaryStreamingUpdateSolrServer.java


 StreamingUpdateSolrServer is hard-coded to write XML data. It should 
 integrate the RequestWriter API so that it can be used to send binary update 
 payloads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: issue SOLR-1565 - StreamingUpdateSolrServer and Javabin

2011-09-09 Thread Patrick Sauts
I have attached the file to the issue SOLR-1565.
Hopping this will help.

Patrick Sauts.

-Original Message-
From: Ryan McKinley [mailto:ryan...@gmail.com] 
Sent: Friday, September 09, 2011 3:57 PM
To: dev@lucene.apache.org
Subject: Re: issue SOLR-1565 - StreamingUpdateSolrServer and Javabin

Hey Patrick-

Can you make a JIRA issue, and post the code there?

Thanks
Ryan


On Fri, Sep 9, 2011 at 5:47 PM, Patrick Sauts patrick.via...@gmail.com
wrote:
 Hi,



 I've made a alpha version of StreamingUpdateSolrServer dedicated to 
 Binary update (javabin), It works fine for me.



 It is not a fix of the issue SOLR-1565, it is a new class.

 But I think It can maybe be useful to fix the bug.



 If somebody tests it thank you to send feedback.



 Patrick Sauts.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
 additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org