Code for getting distinct facet counts across shards(Distributed Process).

2011-06-09 Thread rajini maski
In solr 1.4.1, for getting distinct facet terms count across shards, The piece of code added for getting count of distinct facet terms across distributed process is as followed: Class: facetcomponent.java Function: -- finishStage(ResponseBuilder rb) for (DistribFieldFacet dff :

Re: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
--- On Thu, 6/9/11, Bryan Loofbourrow bloofbour...@knowledgemosaic.com wrote: From: Bryan Loofbourrow bloofbour...@knowledgemosaic.com Subject: Displaying highlights in formatted HTML document To: solr-user@lucene.apache.org Date: Thursday, June 9, 2011, 2:14 AM Here is my use case:

wrong index version of solr3.2?

2011-06-09 Thread Bernd Fehling
After switching to solr 3.2 and building a new index from scratch I ran check_index which reports: Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] Why do I get FORMAT_3_1 and Lucene 3.1, anything wrong with my index? from my schema.xml: schema name=my_solr320_schema

Re: Multiple Values not getting Indexed

2011-06-09 Thread Stefan Matheis
Pawan, just separating multiple values by comma does not make them multi-value in solr-speak. But if you're already using DIH, you may try the http://wiki.apache.org/solr/DataImportHandler#RegexTransformer to 'splitBy' the field and get the expected field-values Regards Stefan On Thu, Jun 9,

Re: Code for getting distinct facet counts across shards(Distributed Process).

2011-06-09 Thread Bill Bell
I have coded and tested this and it appears to work. Are you having any problems? On 6/9/11 12:35 AM, rajini maski rajinima...@gmail.com wrote: In solr 1.4.1, for getting distinct facet terms count across shards, The piece of code added for getting count of distinct facet terms across

Re: Multiple Values not getting Indexed

2011-06-09 Thread Bill Bell
Is there a way to splitBy and trim the field after splitting? I know I can do it with Javascript in DIH, but how about using the regex parser? On 6/9/11 1:18 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Pawan, just separating multiple values by comma does not make them multi-value

Re: Multiple Values not getting Indexed

2011-06-09 Thread Bill Bell
You have to take the input and splitBy something like , to get it into an array and reposted back to Solr... I believe others have suggested that? On 6/8/11 10:14 PM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I am trying to index 2 fields with multiple values. BUT, it is only putting 1

Solr monitoring: Newrelic

2011-06-09 Thread roySolr
Hello, I found this tool to monitor solr querys, cache etc. http://newrelic.com/ http://newrelic.com/ I have some problems with the installation of it. I get the following errors: Could not locate a Tomcat, Jetty or JBoss instance in /var/www/sites/royr Try re-running the install command

Re: Solr monitoring: Newrelic

2011-06-09 Thread Sujatha Arun
You need to install the new relic folder under tomcat folder, in case app server is tomcat. Then from the command line ,you need to run the install commnad given in the new relic site from your newrelic folder. Once this is done, restart the appserver and you shld be able to see a log file

Re: Solr monitoring: Newrelic

2011-06-09 Thread roySolr
I use Jetty, it's standard in the solr package. Where can i find the jetty folder? then i can start this command: java -jar newrelic.jar install -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-monitoring-Newrelic-tp3042889p3042981.html Sent from the Solr - User

Re: Displaying highlights in formatted HTML document

2011-06-09 Thread lboutros
Hi Bryan, how do you index your html files ? I mean do you create fields for different parts of your document (for different stop words lists, stemming, etc) ? with DIH or solrj or something else ? iorixxx, could you please explain a bit more your solution, because I don't see how your solution

Re: Solr monitoring: Newrelic

2011-06-09 Thread Sujatha Arun
There is no jetty folder in the standard package ,but the jetty war file is under example/lib folder ,so this where u need to put the newrelic folder i guess Regards Sujatha On Thu, Jun 9, 2011 at 2:03 PM, roySolr royrutten1...@gmail.com wrote: I use Jetty, it's standard in the solr

Re: Solr monitoring: Newrelic

2011-06-09 Thread roySolr
Yes, that's the problem. There is no jetty folder. I have try the example/lib directory, it's not working. There is no jetty war file, only jetty-***.jar files Same error, could not locate a jetty instance. -- View this message in context:

Re: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
iorixxx, could you please explain a bit more your solution, because I don't see how your solution could give an exact highlighting, I mean with the different fields analysis for each fields. It does not work with your use case (e.g. different synonyms applied different parts of the html/xml

ExtractingRequestHandler - renaming tika generated fields

2011-06-09 Thread Jan Høydahl
Hi, I post a PDF from a CMS client, which has metadata about the document. One of those metadata is the title. I trust the title of the CMS more than the title extracted from the PDF, but I cannot find a way to both send literal.title=CMS-Title as well as changing the name of the title field

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq
Can I specify multiple language in filter tag in schema.xml ??? like below fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr. WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true

Re: Solr monitoring: Newrelic

2011-06-09 Thread Sujatha Arun
Try the RPM support accessed from the accout support page ,Giving all details ,they are very helpful. Regards Sujatha On Thu, Jun 9, 2011 at 2:33 PM, roySolr royrutten1...@gmail.com wrote: Yes, that's the problem. There is no jetty folder. I have try the example/lib directory, it's not

Re: AW: How to deal with many files using solr external file field

2011-06-09 Thread Martin Grotzke
Hi, as I'm also involved in this issue (on the side of Sven) I created a patch, that replaces the float array by a map that stores score by doc, so it contains as many entries as the external scoring file contains lines, but no more. I created an issue for this:

Re: Tokenising based on known words?

2011-06-09 Thread lee carroll
we've played with HyphenationCompoundWordTokenFilterFactory it works better than maintaining a word dictionary to split (although we ended up not using it for reasons i can't recall) see http://lucene.apache.org/solr/api/org/apache/solr/analysis/HyphenationCompoundWordTokenFilterFactory.html

Boost or sort a query with range values

2011-06-09 Thread jlefebvre
Hello I try to boost a query with a range values but I can't find the correct syntax : this is ok .bq=myfield:-1^5 but I want to do something lik this bq=myfield:-1 to 1^5 Boost value from -1 to 1 thanks -- View this message in context:

Re: Boost or sort a query with range values

2011-06-09 Thread lee carroll
[* TO *]^5 On 9 June 2011 11:31, jlefebvre jlefeb...@allocine.fr wrote: Hello I try to boost a query with a range values but I can't find the correct syntax : this is ok .bq=myfield:-1^5 but I want to do something lik this bq=myfield:-1 to 1^5 Boost value from -1 to 1 thanks --

Re: Boost or sort a query with range values

2011-06-09 Thread jlefebvre
thanks it's ok another question how to do a condition in bq ? something like bq=iif(myfield1 = 0 AND myfield2 = 1;1;0) thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-or-sort-a-query-with-range-values-tp3043328p3043406.html Sent from the Solr - User mailing

Re: Boost or sort a query with range values

2011-06-09 Thread Jan Høydahl
Check the new if() function in Trunk, SOLR-2136. You could then use it in bf= or boost= -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. juni 2011, at 13.05, jlefebvre wrote: thanks it's ok another question how to do a

Re: Boost or sort a query with range values

2011-06-09 Thread Jan Høydahl
Btw. your example is a simple boolean query, and this will also work: bq=(myfield1:0 AND myfield2:1)^100.0 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. juni 2011, at 13.31, Jan Høydahl wrote: Check the new if() function

Re: London open source search social - 13th June

2011-06-09 Thread Richard Marr
Just a quick reminder that we're meeting on Monday. Come along if you're around. On 1 June 2011 13:27, Richard Marr richard.m...@gmail.com wrote: Hi guys, Just to let you know we're meeting up to talk all-things-search on Monday 13th June. There's usually a good mix of backgrounds and

[Mahout] Integration with Solr

2011-06-09 Thread Adam Estrada
Has anyone integrated Mahout with Solr? I know that Carrot2 is part of the core build but the docs say that it's not very good for very large indexes. Anyone have thoughts on this? Thanks, Adam

Re: Tokenising based on known words?

2011-06-09 Thread Mark Mandel
Synonyms really wouldn't work for every possible combination of words in our index. Thanks for the idea though. Mark On Thu, Jun 9, 2011 at 3:42 PM, Gora Mohanty g...@mimirtech.com wrote: On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel mark.man...@gmail.com wrote: Not sure if this possible, but

Edismax sorting help

2011-06-09 Thread Denis Kuzmenok
Hi, everyone. I have fields: text fields: name, title, text boolean field: isflag (true / false) int field: popularity (0 to 9) Now i do query: defType=edismax start=0 rows=20 fl=id,name q=lg optimus fq= qf=name^3 title text^0.3 sort=score desc pf=name bf=isflag sqrt(popularity) mm=100%

Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor
Naveen, Not sure our requirement matches yours, but one of the things we index is a comment item that can have one or more files attached to it. To index the whole thing as a single Solr document we create a zipfile containing a file with the comment details in it and any additional

Re: [Mahout] Integration with Solr

2011-06-09 Thread Tomás Fernández Löbbe
I don't know much of it, but I know Grant Ingersoll posted about that: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ On Thu, Jun 9, 2011 at 9:24 AM, Adam Estrada estrada.adam.gro...@gmail.comwrote: Has anyone integrated Mahout

RE: Tokenising based on known words?

2011-06-09 Thread Steven A Rowe
Hi Mark, Are you familiar with shingles aka token n-grams? http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html Use the empty string for the tokenSeparator to get wordstogether style tokens in your index. I think you'll want to apply this filter only at

how can I return function results in my query?

2011-06-09 Thread Jason Toy
I want to be able to run a query like idf(text, 'term') and have that data returned with my search results. I've searched the docs,but I'm unable to find how to do it. Is this possible and how can I do that ?

Re: how can I return function results in my query?

2011-06-09 Thread Ahmet Arslan
I want to be able to run a query  like idf(text, 'term') and have that data returned with my search results.  I've searched the docs,but I'm unable to find how to do it.  Is this possible and how can I do that ? http://wiki.apache.org/solr/FunctionQuery#idf

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Erick Erickson
No, you'd have to create multiple fieldTypes, one for each language Best Erick On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq shariqn...@gmail.com wrote: Can I specify multiple language in filter tag in schema.xml ???  like below fieldType name=text class=solr.TextField

Re: Edismax sorting help

2011-06-09 Thread Yonik Seeley
2011/6/9 Denis Kuzmenok forward...@ukr.net: Hi, everyone. I have fields: text fields: name, title, text boolean field: isflag (true / false) int field: popularity (0 to 9) Now i do query: defType=edismax start=0 rows=20 fl=id,name q=lg optimus fq= qf=name^3 title text^0.3

Re: Solr monitoring: Newrelic

2011-06-09 Thread Ken Krugler
It sounds like roySolr is running embedded Jetty, launching solr using the start.jar If so, then there's no app container where Newrelic can be installed. -- Ken On Jun 9, 2011, at 2:28am, Sujatha Arun wrote: Try the RPM support accessed from the accout support page ,Giving all details

Re: Edismax sorting help

2011-06-09 Thread Denis Kuzmenok
Your solution seems to work fine, not perfect, but much better then mine :) Thanks! If i do query like Samsung i want to see prior most relevant results with  isflag:true and bigger popularity, but if i do query like Nokia 6500  and  there is isflag:false, then it should be higher because

Re: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Koji Sekiguchi
(11/06/09 4:24), Burton-West, Tom wrote: We are trying to implement highlighting for wildcard (MultiTerm) queries. This seems to work find with the regular highlighter but when we try to use the fastVectorHighlighter we don't see any results in the highlighting section of the response.

Re: [Mahout] Integration with Solr

2011-06-09 Thread Tommaso Teofili
Hello Adam, I've managed to create a small POC of integrating Mahout with Solr for a clustering task, do you want to use it for clustering only or possibly for other purposes/algorithms? More generally speaking, I think it'd be nice if Solr could be extended with a proper API for integrating

Indexing data from multiple datasources

2011-06-09 Thread Greg Georges
Hello all, I have checked the forums to see if it is possible to create and index from multiple datasources. I have found references to SOLR-1358, but I don't think this fits my scenario. In all, we have an application where we upload files. On the file upload, I use the Tika extract handler

[Free Text] Field Tokenizing

2011-06-09 Thread Adam Estrada
All, I am at a bit of a loss here so any help would be greatly appreciated. I am using the DIH to grab data from a DB. The field that I am most interested in has anywhere from 1 word to several paragraphs worth of free text. What I would really like to do is pull out phrases like Joe's coffee

Re: [Mahout] Integration with Solr

2011-06-09 Thread Adam Estrada
Thanks for the reply, Tommaso! I would like to see tighter integration like in the way Nutch integrates with Solr. There is a single param that you set which points to the Solr instance. My interest in Mahout is with it's abitlity to handle large data and find frequency, co-location of data,

RE: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Burton-West, Tom
Hi Koji, Thank you for your reply. It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery and DisjunctionMaxQuery and Query constructed by those queries. Sorry, I'm not sure I understand. Are you saying that FVH supports MultiTerm highlighting? Tom

Re: ExtractingRequestHandler - renaming tika generated fields

2011-06-09 Thread Jan Høydahl
One solution to this problem is to change the order of field operation (http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations) to first do fmap.*= processing, then add the fields from literal.*=. Why would anyone want to rename a field they just have explicitly named

Re: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Koji Sekiguchi
(11/06/10 0:14), Burton-West, Tom wrote: Hi Koji, Thank you for your reply. It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery and DisjunctionMaxQuery and Query constructed by those queries. Sorry, I'm not sure I understand. Are you saying that FVH supports

Re: Indexing data from multiple datasources

2011-06-09 Thread Erick Erickson
Hmmm, when you say you use Tika, are you using some custom Java code? Because if you are, the best thing to do is query your database at that point and add whatever information you need to the document. If you're using DIH to do the crawl, consider implementing a Transformer to do the database

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Erick Erickson
The problem here is that none of the built-in filters or tokenizers have a prayer of recognizing what #you# think are phrases, since it'll be unique to your situation. If you have a list of phrases you care about, you could substitute a single token for the phrases you care about... But the

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Adam Estrada
Erick, I totally understand that BUT the keyword tokenizer factory does a really good job extracting phrases (or what look like phrases from) from my data. I don't know why exactly but it does do it. I am going to continue working through it to see if I can't figure it out ;-) Adam On Thu, Jun

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Erick Erickson
The KeywordTokenizer doesn't do anything to break up the input stream, it just treats the whole input to the field as a single token. So I don't think you'll be able to extract anything starting with that tokenizer. Look at the admin/analysis page to see a step-by-step breakdown of what your

RE: Indexing data from multiple datasources

2011-06-09 Thread Greg Georges
Hello Erick, Thanks for the response. No, I am using the extract handler to extract the data from my text files. In your second approach, you say I could use a DIH to update the index which would have been created by the extract handler in the first phase. I thought that lets say I get info

Re: Indexing data from multiple datasources

2011-06-09 Thread Erick Erickson
How are you using it? Streaming the files to Solr via HTTP? You can use Tika on the client to extract the various bits from the structured documents, and use SolrJ to assemble various bits of that data Tika exposes into a Solr document that you then send to Solr. At the point you're transferring

RE: Indexing data from multiple datasources

2011-06-09 Thread David Ross
This thread got me thinking a bit... Does SOLR support the concept of partial updates to documents? By this I mean updating a subset of fields in a document that already exists in the index, and without having to resubmit the entire document. An example would be storing/indexing user tags

RE: Indexing data from multiple datasources

2011-06-09 Thread Greg Georges
No from what I understand, the way Solr does an update is to delete the document, then recreate all the fields, there is no partial updating of the file.. maybe because of performance issues or locking? -Original Message- From: David Ross [mailto:davidtr...@hotmail.com] Sent: 9 juin

Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor and RegexTransformerhttp://wiki.apache.org/solr/DataImportHandler#RegexTransformer as proposed in

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, to make my point more clear: if the CSV has a fixed schema / column layout, using the RegexTransformer is of course a possibility (however awkward). But if you want to implement a (more or less) schema free shopping search engine ... regards On Thu, Jun 9, 2011 at 9:31 PM, Helmut Hoffer von

Unique Results from Edgy Text

2011-06-09 Thread Jamie Johnson
I am using the guide found here ( http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/) to build an autocomplete search capability but in my data set I have some documents which have the same value for the field that is being returned, so for instance

RE: Processing/Indexing CSV

2011-06-09 Thread Dyer, James
Helmut, I recently submitted SOLR-2549 (https://issues.apache.org/jira/browse/SOLR-2549) to handle both fixed-width and delimited flat files. To be honest, I only needed fixed-width support for my app so this might not support everything you mention for delimited files, but it should be a

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
Ludovic, how do you index your html files ? I mean do you create fields for different parts of your document (for different stop words lists, stemming, etc) ? with DIH or solrj or something else ? We are sending them over http, and using Tika to strip the HTML, at present. We do not split

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, there seems to be no way to index CSV using the DataImportHandler. Looking over the features you want, it looks like you're starting from a CSV file (as opposed to CSV stored in a database). Is

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread lboutros
I am not (yet) a tika user, perhaps that the iorixxx's solution is good for you. We will share the highlighter module and 2 other developments soon. ('have to see how to do that) Ludovic. - Jouve France. -- View this message in context:

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, just looked at your code. Definitely an improvement :-) The problem with the double-quotes is, that the delimiter (let's say ',') might be part of the column value. The goal is to process something like this without any tricky configuration name1,name2,name3 val1,val2,...,val3 ... The user

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
s/provide and/provide any/ig ,-) On Thu, Jun 9, 2011 at 10:01 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, just looked at your code. Definitely an improvement :-) The problem with the double-quotes is, that the delimiter (let's say ',') might be part of the

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
-Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, June 08, 2011 11:56 PM To: solr-user@lucene.apache.org Subject: Re: Displaying highlights in formatted HTML document --- On Thu, 6/9/11, Bryan Loofbourrow bloofbour...@knowledgemosaic.com wrote:

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. DIH is flexible enough for building the importing part of such a thing but misses elegant handling of CSV data ... Regards On Thu, Jun 9,

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. You can provide your own list of

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different csv files (from different shops) with individual column layouts (separators, encodings etc.). The idea is to map known field names to defined field names in

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
OK, I think see what you're up to. Might be pretty viable for me as well. Can you talk about anything in your mappings.txt files that is an important part of the solution? It is not important. I just copied it. Plus html strip char filter does not have mappings parameter. It was a copy

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
OK, I think see what you're up to. Might be pretty viable for me as well. Can you talk about anything in your mappings.txt files that is an important part of the solution? It is not important. I just copied it. Plus html strip char filter does not have mappings parameter. It was a

RE: solr Invalid Date in Date Math String/Invalid Date String

2011-06-09 Thread Chris Hostetter
: Here is the error message: : : Fieldtype: tdate (I use the default one in solr schema.xml) : Field value(Index): 2006-12-22T13:52:13Z : Field value(query): [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] : with '[' and ']' : : And it generates the result below: i think the piece of info

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different csv files (from different shops) with individual column layouts (separators, encodings

Re: Solr Indexing Patterns

2011-06-09 Thread Judioo
Very informative links and statement Jonathan. thank you. On 6 June 2011 20:55, Jonathan Rochkind rochk...@jhu.edu wrote: This is a start, for many common best practices: http://wiki.apache.org/solr/SolrRelevancyFAQ Many of the questions in there have an answer that involves

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler kkrugler_li...@transpac.comwrote: On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH regarding the CSV format (James Dyer) and the effort to maintain the CSVLoader (Ken Krugler). How about merging your efforts and migrating the CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-) Best

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
Yes, I asked the wrong question. What I was subconsciously getting at is this: how are you avoiding the possibility of getting hits in the HTML elements? Is that accomplished by putting tag names in your stopwords, or by some other mechanism? HtmlStripCharFilter removes html tags. After it

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 2:21pm, Helmut Hoffer von Ankershoffen wrote: Hi, btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH regarding the CSV format (James Dyer) and the effort to maintain the CSVLoader (Ken Krugler). How about merging your efforts and migrating the

SolrCloud questions

2011-06-09 Thread Upayavira
I'm exploring SolrCloud for a new project, and have some questions based upon what I've found so far. The setup I'm planning is going to have a number of multicore hosts, with cores being moved between hosts, and potentially with cores merging as they get older (cores are time based, so once

Re: Tokenising based on known words?

2011-06-09 Thread Mark Mandel
Thanks for the feedback! This definitely gives me some options to work on! Mark On Thu, Jun 9, 2011 at 11:21 PM, Steven A Rowe sar...@syr.edu wrote: Hi Mark, Are you familiar with shingles aka token n-grams?

Where to find the Log file

2011-06-09 Thread Ruixiang Zhang
Where can I find the log file of solr? Is it turned on by default? (I use Jetty) Thanks Ruixiang

Re: Boosting result on query.

2011-06-09 Thread Jeff Boul
HI, Thank you for your answer. But... I cannot use a boost calculated offline since the boost will changed depending of the query made. Each query will boost the query differently. Any other ideaàs ? Jeff -- View this message in context:

Re: Where to find the Log file

2011-06-09 Thread Jack Repenning
On Jun 9, 2011, at 5:45 PM, Ruixiang Zhang wrote: Where can I find the log file of solr? (I use Jetty) By default, it's in yourapp/solr/logs/solr.log Is it turned on by default? Yes. Oh, yes. Very much so. Uh-huh, you betcha. -==- Jack Repenning Technologist Codesion Business Unit

Re: Where to find the Log file

2011-06-09 Thread Morris Mwanga
Here's help on how to setup logging http://skybert.wordpress.com/2009/07/22/how-to-get-solr-to-log-to-a-log-file/ - Morris - Original Message - From: Ruixiang Zhang rxzh...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, June 9, 2011 8:45:30 PM GMT -05:00 US/Canada Eastern

Re: tika integration exception and other related queries

2011-06-09 Thread Naveen Gupta
Hi Gary, Similar thing we are doing, but we are not creating an XML doc, rather we are leaving TIKA to extract the content and depends on dynamic fields. We are not storing the text as well. But not sure if in future that would be the case. What about microsoft 7 and later related attachments.

ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi This is my document in php $xmldoc = 'adddocfield name=idF_146/fieldfield name=userid74/fieldfield name=groupuseidgmail.com/fieldfield name=attachment_size121/fieldfield name=attachment_namesample.pptx/field/doc/add'; $ch = curl_init(http://localhost:8080/solr/update;);

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq
Thanks Erick for your help. I have another silly question. Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese, news_Japnese etc. after creating these field, can I copy all these to CopyField *defaultquery *like below : *copyField source=news_English dest=defaultquery/ copyField

Re: Multiple Values not getting Indexed

2011-06-09 Thread Pawan Darira
it did not work :( On Thu, Jun 9, 2011 at 12:53 PM, Bill Bell billnb...@gmail.com wrote: You have to take the input and splitBy something like , to get it into an array and reposted back to Solr... I believe others have suggested that? On 6/8/11 10:14 PM, Pawan Darira

Re: Multiple Values not getting Indexed

2011-06-09 Thread Gora Mohanty
On Fri, Jun 10, 2011 at 10:36 AM, Pawan Darira pawan.dar...@gmail.com wrote: it did not work :( [...] Please provide more details of what you tried, what was the error, and any error messages that you got. Just saying that it did not work makes it pretty much impossible for anyone to help you.

Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi, curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' Regards Naveen On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi This is my document in php $xmldoc = 'adddocfield

Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen
Hi, Basically i need to post something like this using curl in php The example of php explained in earlier thread, curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' Should we need to create a temp file and

Re: SolrCloud questions

2011-06-09 Thread Mohammad Shariq
I am also planning to move to SolrCloud; since its still in under development, I am not sure about its behavior in Production. Please update us once you find it stable. On 10 June 2011 03:56, Upayavira u...@odoko.co.uk wrote: I'm exploring SolrCloud for a new project, and have some questions