Re: Solr UIMA Custom Annotator PEAR file installation on Linux

2016-01-08 Thread Tommaso Teofili
Hi, do you mean you want to use a PEAR to provide the Annotator for the Solr UIMA UpdateProcessor ? Can you please detail a bit more your needs? Regards, Tommaso 2016-01-08 1:57 GMT+01:00 techqnq : > implemented custom annotator and generated the PEAR file. > Windos has the PEAR installer util

Re: Using SimpleNaiveBayesClassifier in solr

2015-10-12 Thread Tommaso Teofili
Hi Yewint, the SNB classifier is not an online one, so you should retrain it every time you want to update it. What you pass to the Classifier is a Reader therefore you should grant that this keeps being accessible (not close it) for classification to work. Regarding performance SNB becomes slower

Re: /suggest through SolrJ?

2015-04-29 Thread Tommaso Teofili
2015-04-27 19:22 GMT+02:00 Alessandro Benedetti : > Just had the very same problem, and I confirm that currently is quite a > mess to manage suggestions in SolrJ ! > I have to go with manual Json parsing. > or very not nice NamedList API mess (see an example in JR Oak [1][2]). Regards, Tommaso

Re: solr uima and opennlp

2015-05-21 Thread Tommaso Teofili
Hi Andreaa, 2015-05-21 18:12 GMT+02:00 hossmaa : > Hi everyone > > I'm trying to plug in a new UIMA annotator into solr. What is necessary for > this? Is is enough to build a Jar similarly to the ones from the > uima-addons > package? yes, exactly. Actually you just need a jar containing the An

Re: solr uima and opennlp

2015-06-01 Thread Tommaso Teofili
yeah, I think you'd rather post it to d...@uima.apache.org . Regards, Tommaso 2015-05-28 15:19 GMT+02:00 hossmaa : > Hi Tommaso > > Thanks for the quick reply! I have another question about using the > Dictionary Annotator, but I guess it's better to post it separately. > > Cheers > Andreea > >

Re: Knn classifier doesn't work

2017-09-02 Thread Tommaso Teofili
it would sound like none of the docs in your index has the "class" field, in your case Tags, whereas classification needs some bootstrapping (add some examples of correctly classified docs to the index beforehand). On the other hand the naive bayes implementation has definitely a bug as the MultiFi

Re: multi language search engine in solr

2017-09-11 Thread Tommaso Teofili
another thing to consider is what users would expect, would english user search over english docs only ? if yes, the most important task would be to correctly set up / create accurate per language analyzers, otherwise you may consider to also adopt machine translation, either on the search queries

Re: Knn classifier doesn't work

2017-09-19 Thread Tommaso Teofili
hi Alessandro, yes please, feel free to open a Jira issue, patches welcome ! Tommaso Il giorno lun 18 set 2017 alle ore 14:30 alessandro.benedetti < a.benede...@sease.io> ha scritto: > Hi Tommaso, > you are definitely right! > I see that the method : MultiFields.getTerms > returns : > if (term

Re: AEM SOLR integaration

2017-09-24 Thread Tommaso Teofili
integrating can be done in AEM at different layers, however my suggestion would be to enable that at the repository (Oak) level [1] so that usual AEM search would also take ACLs into account. [1] : http://jackrabbit.apache.org/oak/docs/query/solr.html Il giorno ven 22 set 2017 alle ore 18:47 Davi

Re: Exception during integration of Solr with UIMA

2017-03-20 Thread Tommaso Teofili
Hi, the UIMA OpenCalais Annotator you're using refers to an old endpoint which is no longer available, see log line [1]. I would suggest to simply remove the OpenCalaisAnnotator entry from your UIMAUpdateRequestProcessor configuration in solrconfig.xml. More generally you should put only the UIMA

Re: Caching requests to Solr

2014-03-08 Thread Tommaso Teofili
following up on this, I've created https://issues.apache.org/jira/browse/SOLR-5826 , with a draft patch. Regards, Tommaso 2014-03-05 8:50 GMT+01:00 Tommaso Teofili : > Hi all, > > I have the following requirement where I have an application talking to > Solr via SolrJ where I d

Re: [Clustering] Full-Index Offline cluster

2014-03-10 Thread Tommaso Teofili
Hi Ahmet, Ale, right, there's a classification module for Lucene (and therefore usable in Solr as well), but no clustering support there. Regards, Tommaso 2014-03-10 19:15 GMT+01:00 Ahmet Arslan : > Hi, > > Thats weird. As far as I know there is no such thing. There is > classification stuff b

deep paging without sorting / keep IRs open

2014-05-15 Thread Tommaso Teofili
Hi all, in one use case I'm working on [1] I am using Solr in combination with a MVCC system [2][3], so that the (Solr) index is kept up to date with the system and must handle search requests that are tied to a certain state / version of it and of course multiple searches based on different versi

Re: deep paging without sorting / keep IRs open

2014-05-19 Thread Tommaso Teofili
thanks Yonik, that looks promising, I'll have a look at it. Tommaso 2014-05-17 17:57 GMT+02:00 Yonik Seeley : > On Sat, May 17, 2014 at 10:30 AM, Yonik Seeley > wrote: > > I think searcher leases would fit the bill here? > > https://issues.apache.org/jira/browse/SOLR-2809 > > > > Not yet imple

Re: Integrate solr with openNLP

2014-06-04 Thread Tommaso Teofili
Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your

Tika analyzers

2014-07-30 Thread Tommaso Teofili
Hi all, while SolrCell works nicely when in need of indexing binary documents, I am wondering about the possibility of having Lucene / Solr documents that have binaries in specific Lucene fields, e.g. title="a nice doc", name"blabla.doc", binary="0x1234...". In that case the "binary" field should

Re: Alternatives to GATE?

2014-01-16 Thread Tommaso Teofili
If you need a framework to build your enhancement pipeline on I think Apache UIMA [1] is good as it's also able to store annotated documents into Lucene and Solr so it may be a good fit for your needs. Just consider that you have to learn how to use / develop on top of it, it's not a big deal but n

Caching requests to Solr

2014-03-04 Thread Tommaso Teofili
Hi all, I have the following requirement where I have an application talking to Solr via SolrJ where I don't know upfront which type of Solr instance that will be communicating with, while this is easily solvable by using different SolrServer implementations I also need a way to ensure that all th

Re: Issue with multivalued fields in UIMA

2014-08-29 Thread Tommaso Teofili
Hi, it'd be good if you could open a Jira issues (with a patch preferably) describing your findings. Thanks, Tommaso 2014-08-29 18:34 GMT+02:00 mkhordad : > I solved it. It was caused by a bug in UIMAUpdateRequestProcessor. > > > > -- > View this message in context: > http://lucene.472066.n3.n

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Tommaso Teofili
Hi, I you may leverage and / or improve MLT component [1]. HTH, Tommaso [1] : http://wiki.apache.org/solr/MoreLikeThis 2013/7/23 Furkan KAMACI > Hi; > > Sometimes a huge part of a document may exist in another document. As like > in student plagiarism or quotation of a blog post at another b

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-23 Thread Tommaso Teofili
entually train a classifier to help you mark other texts as quote / plagiarism HTH, Tommaso 2013/7/23 Furkan KAMACI > Actually I need a specialized algorithm. I want to use that algorithm to > detect duplicate blog posts. > > 2013/7/23 Tommaso Teofili > > > Hi, > >

Re: Too slow UIMA with Solr

2013-08-29 Thread Tommaso Teofili
Hi Jun, I agree the AE (instead of the AEProvider) should be cached on the UpdateRequestProcessor. In previous revisions [1] it was cached directly by the BasicAEProvider so there wasn't need of that in the UIMAUpdateRequestProcessor but, since that has changed, I agree that should be done there a

Re: Too slow UIMA with Solr

2013-08-29 Thread Tommaso Teofili
p.s. see https://issues.apache.org/jira/browse/SOLR-5201 2013/8/29 Tommaso Teofili > Hi Jun, > > I agree the AE (instead of the AEProvider) should be cached on the > UpdateRequestProcessor. > In previous revisions [1] it was cached directly by the BasicAEProvider so > there w

Re: solr UIMA exception

2011-08-29 Thread Tommaso Teofili
The UIMA AlchemyAPI annotator is failing for you due to an error no server side and I think you should look at your Solr UIMA configuration as it seem you wanted to extract entities from text: "Senator Dick Durbin (D-IL) Chicago , March 3,2007." while the error says "org.apache.solr.uima.processor

Different Solr versions between Master and Slave(s)

2011-09-19 Thread Tommaso Teofili
Hi all, while thinking about a migration plan of a Solr 1.4.1 master / slave architecture (1 master with N slaves already in production) to Solr 3.x I imagined to go for a graceful migration, starting with migrating only one/two slaves, making the needed tests on those while still offering the inde

Re: UIMA DictionaryAnnotator partOfSpeach

2011-09-28 Thread Tommaso Teofili
I think one problem is that the featurePath is not set correctly. Note that you are assuming PoS are written somewhere in some annotation feature so this mean you should've setup the UIMA pipeline to include also, for example, the HMM Tagger [1] which adds (by default) the posTag feature to TokenAn

Re: Upgratding the Index from 1.4.1 to 3.4 using replication

2011-10-27 Thread Tommaso Teofili
I don't think it'll work as I've tried this approach myself and the blocking issue was that Solr 1.4.1 use a different javabin version than Solr 3.4 (I think it's 1 vs 2) so the master and the slave(s) can't communicate using standard replication handler and thus can't exchange information and data

Re: Document Processing

2011-12-06 Thread Tommaso Teofili
Hello Michael, I can help you with using the UIMA UpdateRequestProcessor [1]; the current implementation uses in-memory execution of UIMA pipelines but since I was planning to add the support for higher scalability (with UIMA-AS [2]) that may help you as well. Tommaso [1] : http://svn.apache.org

Re: Problems with SolrUIMA

2011-12-10 Thread Tommaso Teofili
Hello Adriana, your configuration looks fine to me. The exception you pasted makes me think you're using a Solr instance at a certain version (3.4.0) while the Solr-UIMA module jar is at a different version; I remember there has been a change in the UpdateRequestProcessorFactory API at some point

Re: How to get the time document was indexed?

2012-01-20 Thread Tommaso Teofili
Hi Alex, you can create a field in the schema.xml of type date or tdate called (something like) idx_timestamp and set its default option to NOW then you won't have to add any extra fields to the documents because it will be automatically created when documents are indexed. Hope it helps. Tommaso 2

Re: Tag generation

2010-07-15 Thread Tommaso Teofili
Hi all, in UIMA there are two components which wrap OpenCalais [1] and AlchemyAPI [2][3] services that you could use, then you could also add something else to the tagging pipeline (using existing stuff [4] or implementing your own logic). Hope this helps. Tommaso [1] : http://uima.apache.org/sand

Re: Solr Best Version

2010-07-16 Thread Tommaso Teofili
Hi all, I read in a previous thread [1] that also the branch3.x version could be a good choice, but I don't know what differences exist at the moment between the two versions and how stable branch3.x is. Maybe someone else could point these things out. My 0.0002 cents. Tommaso [1] : http://markmai

Re: solrconfig.xml and xinclude

2010-07-22 Thread Tommaso Teofili
Hi, I am trying to do a similar thing within the schema.xml (using Solr 1.4.1), having a (super)schema that is common to 2 instances and specific fields I would like to include (with XInclude). Something like this: * ... ... * and it works with the sp

Re: solrconfig.xml and xinclude

2010-07-22 Thread Tommaso Teofili
all. Any other ideas? Cheers, Tommaso 2010/7/22 Tommaso Teofili > Hi, > I am trying to do a similar thing within the schema.xml (using Solr 1.4.1), > having a (super)schema that is common to 2 instances and specific fields I > would like to include (with XInclude). &

Re: Problem with Pdf, Sol 1.4.1 Cell

2010-07-26 Thread Tommaso Teofili
Hi, I think there is an open bug for it at: https://issues.apache.org/jira/browse/SOLR-1902 Using Solr 1.4.1 and upgrading Tika libraries to 0.8 snapshot I had also to upgrade pdfbox, fontbox and jembox to 1.2.1; I got no errors and it seems it's able to index PDFs without any errors (I can query t

Re: slave index is bigger than master index

2010-07-26 Thread Tommaso Teofili
Hi, I think that you may be using a Lucene/Solr IndexDeletionPolicy that does not remove old commits (and you aren't propagating solr-config via replication). You can configre this feature on the solr-config.xml inside the tag: * 1 0 * I hope this can be help

Re: Any tips/guidelines to turning the Solr/luence performance in a master/slave/sharding environment

2010-07-28 Thread Tommaso Teofili
Hi, I think the starting point should be : http://wiki.apache.org/solr/SolrPerformanceFactors For example you could start playing with the mergeFactor parameter. My 2 cents, Tommaso 2010/7/27 Chengyang > How to reduce the index files size, decreate the sync time between each > nodes. decrease th

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili
I attached a patch for Solr 1.4.1 release on https://issues.apache.org/jira/browse/SOLR-1902 that made things work for me. This strange behaviour for me was due to the fact that I copied the patched jars and war inside the dist directory but forgot to update the war inside the example/webapps direc

Re: Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-28 Thread Tommaso Teofili
it be more stable to stick with 1.4.1 and your patch > to > > get to Tika 0.8, or to stick with the 4.0 trunk version? > > > > Best, > > Dave > > > > -Original Message- > > From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] > > Sent: Wed

Re: Require some advice

2010-08-20 Thread Tommaso Teofili
Hi Pavan, you may want to plug UIMA as a particular UpdateRequestProcessor [1] while indexing data (I am working on such a use case). This way you could extract entities and add them either as dynamicFields or pre defined (fixed) fields. 2010/8/12 Michael Griffiths > > While there are some decen

Re: SolrException log

2010-08-23 Thread Tommaso Teofili
Hi Bastian, this seems to be related to IO and file deletion (optimization compacts and removes index files), are you running Solr on NFS or a distributed file system? You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in solrconfig.xml to handle this. My 2 cents, Tommaso 2010/8/11 B

Re: SolrException log

2010-08-25 Thread Tommaso Teofili
port back if that solved the > problem, thank you for the hint. > > cheers, > Bastian > > -Ursprüngliche Nachricht- > Von: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] > Gesendet: Montag, 23. August 2010 15:31 > An: solr-user@lucene.apache.org > Betreff:

Index with ItalianStemmer

2010-09-03 Thread Tommaso Teofili
Hi all, I am experiencing a strange behavior while indexing italian text (an indexed not stored text field) when stemming with italian language: generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseC

Re: Index with ItalianStemmer

2010-09-04 Thread Tommaso Teofili
Thanks Robert for this hint, the problem was exactly that I needed to define the right stemmer at query time too. Best regards, Tommaso 2010/9/3 Robert Muir > On Fri, Sep 3, 2010 at 8:04 AM, Tommaso Teofili > wrote: > > > Does anyone know what could be the root cause or

Solr UIMA integration

2010-09-20 Thread Tommaso Teofili
Hi all, I am working on integrating Apache UIMA as un UpdateRequestProcessor for Apache Solr and I am now at the first working snapshot. I put the code on GoogleCode [1] and you can take a look at the tutorial [2]. I would be glad to donate it to the Apache Solr project, as I think it could be a u

Re: Solr UIMA integration

2010-09-21 Thread Tommaso Teofili
e basis? It could be done as follows: > > concept > concept > concept > ... > > > Thanks for this nice suggestion, I will put it in the TODO list :-) Regards, Tommaso > On 20. sep. 2010, at 12.35, Tommaso Teofili wrote: > > > Hi all, > > I am workin

Re: Solr UIMA integration

2010-09-24 Thread Tommaso Teofili
Hi Maheshkumar, I never had this one before, which version of UIMA dependencies (uima-core, AlchemyAPIAnnotator, OpenCalaisAnnotator, Tagger, WhitespaceTokenizer) are you using? It should be 2.3.1-SNAPSHOT. Which version of Solr? It seems that there is a problem in Tagger reading its model (to gene

Re: Solr UIMA integration

2010-09-27 Thread Tommaso Teofili
Hi Maheshkumar, I attached a patch for inclusion of this project as a Solr contrib module [1] , there you can find the patch to apply to the Solr trunk along with needed jars (attached as a zip archive). I think that your issue could be related to the fact that GC project dependency is from Solr 1.

Re: Solr UIMA integration

2010-10-01 Thread Tommaso Teofili
Hi Mahesh , 2010/10/1 maheshkumar > > Thanks a lot for uploading the relevant dependencies jars. The issue was > bcoz of java heap size i increased the heap and the issue was resolved. > > I am happy it solved your issue. > Now i am getting 403 error while connecting to > http://api.opencalais

Re: Solr UIMA integration

2010-10-05 Thread Tommaso Teofili
Hi Mahesh, here your AlchemyAPI calls are failing, in fact their status is ERROR (sent by AlchemyAPI webservice itself) so you should try your service call outside Solr/UIMA, for example from their website and see if and why it's failing with the text you're trying to enrich. However you can post h

Re: Solr UIMA integration

2010-10-06 Thread Tommaso Teofili
Hi Mahesh, the issue here is that you're not sending a ... to Solr from which UIMAUpdateRequestProcessor extracts text to analyze :) Infact by default UIMAUpdateRequestProcessor extracts text to analyze from that field and send that value to a UIMA pipeline. Obviously you could choose to customize

Re: Can anyone compare Solr with Autonomy?

2010-10-08 Thread Tommaso Teofili
Hi Scott, I can say that in my experience I've seen a company dropping the Autonomy solution in favour of Apache Solr :-) It's not a comparison and nor a matter of better/worse but it can count when evaluating how market is behaving regarding that. Tommaso 2010/10/8 Otis Gospodnetic > Scott, > >

Re: OutOfMemory and auto-commit

2010-10-29 Thread Tommaso Teofili
If the problem is autowarming queries running in the meantime maybe you could consider changing set to true the following: false and/or change this value 2 another option would be lowering the value of autowarmCount inside the cache definitions. Hope this helps. Tommaso 2010/10/25 Jona

Re: RAM increase

2010-10-29 Thread Tommaso Teofili
Hello Lance, form the command line run: > export JAVA_OPTS='-d64 -Xms128m -Xmx5g' eventually changing values of Xms and Xmx. Hope this helps. Tommaso 2010/10/29 Lance Norskog > When you start the Tomcat app, you tell it how much memory to allocate > to the JVM. I don't remember where, probably

Re: Exception while processing: attach document

2010-10-29 Thread Tommaso Teofili
I think this is a JDBC warning message since some isolation levels may not be implemented in the actual (Oracle) implementation (e.g.: READ_UNCOMMITTED). May your issue be related to some transactions updating/inserting/deleting records on your Oracle DB while trying to run DIH? Regards, Tommaso 2

mergeFactor questions

2010-11-04 Thread Tommaso Teofili
Hi all, Having read the SolrPerformanceFactors wiki page [1], I'd still need a couple of clarifications about mergeFactor (I am using version 1.4.1) so if anyone can help it would be nice. - Is mergeFactor a one time configuration setting that is considered only when creating the index for t

querying multiple fields as one

2010-11-04 Thread Tommaso Teofili
Hi all, having two fields named 'type' and 'cat' with identical type and options, but different values recorded, would it be possible to query them as they were one field? For instance q=type:electronics cat:electronics should return same results as q=common:electronics I know I could make it def

Re: querying multiple fields as one

2010-11-04 Thread Tommaso Teofili
t a big issue > right, no problem if the scoring isn't exactly the same. Thanks, Tommaso > > Best > Erick > > On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili > wrote: > > > Hi all, > > having two fields named 'type' and 'cat' with i

Re: mergeFactor questions

2010-11-04 Thread Tommaso Teofili
Thanks so much Shawn, I am in a scenario with many inserts while searching, each consisting of ~ 500documents, I will monitor the number of segments taking your considerations in mind :-) Regards, Tommaso 2010/11/4 Shawn Heisey > On 11/4/2010 3:27 AM, Tommaso Teofili wrote: > >

Re: full text search in multiple fields

2010-11-12 Thread Tommaso Teofili
Hi, 2010/11/12 PeterKerk > > I want to provide a full text search function. > > This function has to search through the 2 fields: "title" and "description" > that I have defined in my schema.xml (both of type "string"). > > Now, since solr doesnt (by default) provide an or operator, I don't th

Re: Solr UIMA with KEA

2012-11-23 Thread Tommaso Teofili
the AlchemyAPI service is not mandatory (it's there just as an example and can be safely removed), you can use whatever service you want as long as it's wrapped by a UIMA AnalysisEngine and you specify its descriptor. See following updateChain example configuration : /path/to/KEAdescritpor.x

Re: Indexing nouns only with UIMA works - performance issue?

2013-02-04 Thread Tommaso Teofili
Thanks Kai for your feedback, I'll look into it and let you know. Regards, Tommaso 2013/2/1 Kai Gülzau > I now use the "stupid" way to use the german corpus for UIMA: copy + paste > :-) > > I modified the Tagger-2.3.1.jar/HmmTagger.xml to use the german corpus > ... > > file:german/TuebaMode

Re: Indexing nouns only with UIMA works - performance issue?

2013-02-04 Thread Tommaso Teofili
Regarding configuration parameters have a look at https://issues.apache.org/jira/browse/LUCENE-4749 Regards, Tommaso 2013/2/4 Tommaso Teofili > Thanks Kai for your feedback, I'll look into it and let you know. > Regards, > Tommaso > > > 2013/2/1 Kai Gülzau > >&g

Re: Indexing nouns only with UIMA works - performance issue?

2013-02-04 Thread Tommaso Teofili
descriptor and is then set with the given actual value. HTH, Tommaso 2013/2/4 Tommaso Teofili > Regarding configuration parameters have a look at > https://issues.apache.org/jira/browse/LUCENE-4749 > Regards, > Tommaso > > > 2013/2/4 Tommaso Teofili > >> Thanks

Re: Indexing nouns only with UIMA works - performance issue?

2013-02-05 Thread Tommaso Teofili
nceAE.xml" > tokenType="org.apache.uima.SentenceAnnotation" ngramsize="2" > modelFile="file:german/TuebaModel.dat" /> > > ??? > > Thanks, > > Kai > > > -Original Message- > From: Tommaso Teofili [mailto:tommaso.teof...@gmai

Re: which analyzer is used for facet.query?

2013-02-13 Thread Tommaso Teofili
I agree that's definitely strange, I'll have a look at it. Tommaso 2013/2/12 Chris Hostetter > > : > So it seems that facet.query is using the analyzer of type index. > : > Is it a bug or is there another analyzer type for the facet query? > > That doesn't really make any sense ... > > i don't

Re: Solr UIMA

2013-02-21 Thread Tommaso Teofili
Hi Bart, I think the only way you can do that is by reindexing, or maybe by just doing a dummy atomic update [1] to each of the documents (e.g. adding or changing a field of type 'ignored' or something like that) that weren't "tagged" by UIMA before. Regards, Tommaso [1] : http://wiki.apache.org

Re: How to do this in Solr? random result for the first few results

2012-02-09 Thread Tommaso Teofili
I think you may use/customize the query elevation component to achieve that. http://wiki.apache.org/solr/QueryElevationComponent Tommaso 2012/2/9 mtheone > Say I have a classified ads site, I want to display 2 random items (premium > ads) in the beginning of the search result and the rest are re

Re: Sorting solrdocumentlist object after querying

2012-02-09 Thread Tommaso Teofili
Hi Kashif, maybe the field collapsing feature [1] may help you with your requirement. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/FieldCollapsing

Re: proper syntax for using sort query parameter in responseHandler

2012-02-17 Thread Tommaso Teofili
Hi Mark, Having a look at that requestHandler it looks ok [1], are you experiencing any errors? If so did you check the wiki page FieldOptionsByUseCase [2], maybe that field (rankNo) options contain indexed="false" or multiValued="true"? HTH, Tommaso [1] : http://wiki.apache.org/solr/CommonQueryPa

Re: performance between ExternalFileField and Join

2012-03-01 Thread Tommaso Teofili
Also regarding the Join functionality I remember Yonik pointed out it's O(# unique terms) but I agree with Erik on the ExternalFileField as you can use it just inside a function query, for example, for boosting. Tommaso 2012/3/1 Erick Erickson > Hmmm. ExternalFileFields can only be float values,

Re: in solr how to support Document.SetBoost as lucene?

2012-03-07 Thread Tommaso Teofili
when indexing a Solr document by sending XML files via HTTP POST you can set it adding the boost element to the doc one, see http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22 If you plan to index using the java APIs (SolrJ, see http://wiki.apache.org/solr/Solrj) you can

Re: Reporting tools

2012-03-09 Thread Tommaso Teofili
as Gora says there is the stats component you can take advantage of or you could also use JMX directly [1] or LucidGaze [2][3] or commercial services like [4] or [5] (these are the ones I know but there may be also others), each of them with different level/type of service. Tommaso [1] : http://w

Re: Solr Monitoring / Stats

2012-03-15 Thread Tommaso Teofili
would http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/work for your scenario? Tommaso 2012/3/12 Alex Leonhardt > Hi All, > > I was wondering if anyone knows of a free tool to use to monitor multiple > Solr hosts under one roof ? I found some non

Re: Solr with UIMA

2012-03-28 Thread Tommaso Teofili
Hi Chris, 2012/3/28 chris3001 > I am having a hard time integrating UIMA with Solr. I have downloaded the > Solr 3.5 dist and have it successfully running with nutch and tika on > windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars > from solr/contrib/uima/lib to the work

Re: Solr with UIMA

2012-03-28 Thread Tommaso Teofili
Hi Chris, I did never tried the Nutch integration so I can't help with that. However I'll try to repeat your same setup and will let you know what it comes out for me. Tommaso 2012/3/28 chris3001 > Still not getting there on Solr with UIMA... > Has anyone taken example 1 (RoomAnnotator) and su

Re: Solr with UIMA

2012-04-04 Thread Tommaso Teofili
Hi again Chris, I finally manage to find some proper time to test your configuration. First thing to notice is that it worked for me assuming the following pre-requisites were satisfied: - you had the jar containing the AnalysisEngine for the RoomAnnotator.xml in your libraries section (this is ac

Re: Using UIMA in Solr behind a firewall

2012-04-04 Thread Tommaso Teofili
Hello Peter, I think that is more related to UIMA AlchemyAPIAnnotator [1] or to AlchemyAPI services themselves [2] because Solr just use the out of the box UIMA AnalysisEngine for that. Thus it may make sense to ask on d...@uima.apache.org (or even directly to AlchemyAPI guys). HTH, Tommaso [1] :

Re: Problem with AND clause in multi core search query

2012-05-14 Thread Tommaso Teofili
The latter is supposed to work: http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=column1 :"A" OR column2:"B" The first query cannot work as there is no document neither in core0 nor in core1 which has A in field column1 and B in field column2 but

Re: shard distribution of multiple collections in SolrCloud

2012-05-24 Thread Tommaso Teofili
2012/5/23 Mark Miller > Yeah, currently you have to create the core on each node...we are working > on a 'collections' api that will make this a simple one call operation. > Mark, is there a Jira for that yet? Tomamso > > We should have this soon. > > - Mark > > On May 23, 2012, at 2:36 PM, Da

Re: shard distribution of multiple collections in SolrCloud

2012-05-24 Thread Tommaso Teofili
7;ll take a look and try to help there. Tommaso > > > On May 24, 2012, at 4:39 AM, Tommaso Teofili wrote: > > > 2012/5/23 Mark Miller > > > >> Yeah, currently you have to create the core on each node...we are > working > >> on a 'collections'

Re: Solr with UIMA

2012-06-04 Thread Tommaso Teofili
Hi all, 2012/6/1 Jack Krupansky > Is it failing on the first document? I see "uid 5", suggests that it is > not. If not, how is this document different from the others? > > I see the exception > org.apache.uima.resource.**ResourceInitializationExceptio**n, suggesting > that some file cannot be l

Re: Levenstein Distance

2012-06-07 Thread Tommaso Teofili
During the analysis phase you could add payloads to the terms using LevensteinDistance and then use that in conjunction with a PayloadSimilarity class ´See [1] for an example), or just use a custom Similarity class which uses LevensteinDistance for scoring. HTH Tommaso [1] : http://www.lucidimagin

Re: DIH full-import failure, no real error message

2010-11-17 Thread Tommaso Teofili
Hi Erik 2010/11/17 Erik Fäßler > . But until this point it is necessary to retrieve the full documents, > otherwise I'd have to re-evaluate and partly rewrite our UIMA-Pipelines. Did you see https://issues.apache.org/jira/browse/SOLR-2129 for enhancing docs with UIMA pipelines just before they

Re: special sorting

2010-11-29 Thread Tommaso Teofili
Perhaps, depending on your domain logic you could use function queries to achieve that. http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function Regards, Tommaso 2010/11/29 Papp Richard > Hello, > > I have many pages with the same content in the search result (the result > is the same for som

Dinamically change master

2010-11-30 Thread Tommaso Teofili
Hi all, in a replication environment if the host where the master is running goes down for some reason, is there a way to communicate to the slaves to point to a different (backup) master without manually changing configuration (and restarting the slaves or their cores)? Basically I'd like to be

Re: Dinamically change master

2010-11-30 Thread Tommaso Teofili
on Distribution scripts or doing some custom stuff. But, please, if you have any other advice let me know. Thanks again. Tommaso 2010/11/30 Ken Krugler > Hi Tommaso, > > > On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote: > > Hi all, >> >> in a replication enviro

Re: Dinamically change master

2010-11-30 Thread Tommaso Teofili
> On Tue, 30 Nov 2010 09:18 -0800, "Ken Krugler" > wrote: > > Hi Tommaso, > > > > On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote: > > > > > Hi all, > > > > > > in a replication environment if the host where the master is running >

Re: Dinamically change master

2010-12-01 Thread Tommaso Teofili
extra attribute 'masterUrl' or other attributes like 'compression' (or > any other parameter which is specified in the tag) to > do a one time replication from a master. This obviates the need for > hardcoding the master in the slave." > > HTH, Upayavira > &

Re: Dinamically change master

2010-12-02 Thread Tommaso Teofili
up master. Cheers, Tommaso 2010/12/1 Tommaso Teofili > Thanks Upayavira, that sounds very good. > > p.s.: > I read that page some weeks ago and didn't get back to check on it. > > > 2010/12/1 Upayavira > >> Note, all extracted from http://wiki.apa

Re: Taxonomy and Faceting

2010-12-07 Thread Tommaso Teofili
Hi, as I made the patch I can guide you through the Solr-UIMA integration configuration, just give me some more time as I am really busy at the moment and can't deepen it. There was a mini tutorial but it's outdated, I'll update it and let you know here in a few hours. Cheers, Tommaso 2010/12/7 we

Re: Taxonomy and Faceting

2010-12-08 Thread Tommaso Teofili
Thanks Markus for helping with that, there are some changes in the configuration that need to be done. However I've just submitted a new patch at [1] which fix jar packaging and holds a README.txt which contains the following, it's very simple : 1. copy generated solr-uima jar and its libs (und

Re: Indexing documents with SOLR

2010-12-10 Thread Tommaso Teofili
Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt > Hi All, > I am a newbie to SOLR and trying to integrate TIKA + SOLR. > Can anyone please guide me, how to achieve

Re: Taxonomy and Faceting

2010-12-13 Thread Tommaso Teofili
With the SOLR-2129 patch you enable an Apache UIMA [1] pipeline to enrich documents being indexed. The base pipeline provided with the patch uses the following blocks (see OverridingParamsExtServicesAE.xml): AggregateSentenceAE OpenCalaisAnnotator TextKeywordExtractionAED

Re: Problem with multicore

2010-12-15 Thread Tommaso Teofili
Hi Jörg, I think the first thing you should check is your Ubuntu's encoding, second one is file permissions (BTW why are you sudoing?). Did you try using the bash script under example/exampledocs named "post.sh" (use it like this: 'sh post.sh *.xml') Cheers, Tommaso 2010/12/15 Jörg Agatz > Hall

Parenthesis in query string

2010-12-15 Thread Tommaso Teofili
Hi all, I've just noticed a strange behavior (or, at least, I didn't expect that), when adding useless parenthesis to a query. Using the lucene query parser in Solr I get no results with the query: * ((( NOT (text:"something"))) AND date <= 2010-12-15) * while I get the expected results when the

Transparent redundancy in Solr

2010-12-15 Thread Tommaso Teofili
Hi all, me, Upayavira and other guys at Sourcesense have collected some Solr architectural views inside the presentation at [1]. For sure one can set up an architecture for failover and resiliency on the "search face" (search slaves with coordinators and distributed search) but I'd like to ask how

Solr and UIMA #2

2011-01-04 Thread Tommaso Teofili
Hi all, just a quick notice to let you know that a new component to consume UIMA objects to a (local or remote) Solr instance is available inside UIMA sandbox [1]. Note that this "writes" to Solr from UIMA pipelines (push) while in SOLR-2129 [2] Solr "asks" UIMA to extract metadata while indexing d

Re: Searchers and Warmups

2011-01-14 Thread Tommaso Teofili
Hi David, The idea is that you can define some "listeners" which make a list of queries to an IndexSearcher. In particular the firstSearcher event is related to the very first IndexSearcher being created inside the Solr instance while the newSearcher is the event related to the creation of a new In

Re: solr - uima error

2011-01-29 Thread Tommaso Teofili
Hi Darx you need to run 'and dist' under solr/contrib/uima and then reference the created jar (under solr/contrib/uima/build) inside the solrconfig.xml ( tag) of your instance. Hope this helps, Tommaso 2011/1/29 Darx Oman > I tried to do the uima integration with solr > I followed the steps in t

  1   2   >