In solr 1.4.1, for getting distinct facet terms count across shards,
The piece of code added for getting count of distinct facet terms across
distributed process is as followed:
Class: facetcomponent.java
Function: -- finishStage(ResponseBuilder rb)
for (DistribFieldFacet dff :
--- On Thu, 6/9/11, Bryan Loofbourrow bloofbour...@knowledgemosaic.com wrote:
From: Bryan Loofbourrow bloofbour...@knowledgemosaic.com
Subject: Displaying highlights in formatted HTML document
To: solr-user@lucene.apache.org
Date: Thursday, June 9, 2011, 2:14 AM
Here is my use case:
After switching to solr 3.2 and building a new index from scratch I ran
check_index which reports:
Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
Why do I get FORMAT_3_1 and Lucene 3.1, anything wrong with my index?
from my schema.xml:
schema name=my_solr320_schema
Pawan,
just separating multiple values by comma does not make them
multi-value in solr-speak. But if you're already using DIH, you may
try the http://wiki.apache.org/solr/DataImportHandler#RegexTransformer
to 'splitBy' the field and get the expected field-values
Regards
Stefan
On Thu, Jun 9,
I have coded and tested this and it appears to work.
Are you having any problems?
On 6/9/11 12:35 AM, rajini maski rajinima...@gmail.com wrote:
In solr 1.4.1, for getting distinct facet terms count across shards,
The piece of code added for getting count of distinct facet terms across
Is there a way to splitBy and trim the field after splitting?
I know I can do it with Javascript in DIH, but how about using the regex
parser?
On 6/9/11 1:18 AM, Stefan Matheis matheis.ste...@googlemail.com wrote:
Pawan,
just separating multiple values by comma does not make them
multi-value
You have to take the input and splitBy something like , to get it into
an array and reposted back to
Solr...
I believe others have suggested that?
On 6/8/11 10:14 PM, Pawan Darira pawan.dar...@gmail.com wrote:
Hi
I am trying to index 2 fields with multiple values. BUT, it is only
putting
1
Hello,
I found this tool to monitor solr querys, cache etc.
http://newrelic.com/ http://newrelic.com/
I have some problems with the installation of it. I get the following
errors:
Could not locate a Tomcat, Jetty or JBoss instance in /var/www/sites/royr
Try re-running the install command
You need to install the new relic folder under tomcat folder, in case
app server is tomcat.
Then from the command line ,you need to run the install commnad given in
the new relic site from your newrelic folder.
Once this is done, restart the appserver and you shld be able to see a log
file
I use Jetty, it's standard in the solr package. Where can i find
the jetty folder?
then i can start this command:
java -jar newrelic.jar install
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-monitoring-Newrelic-tp3042889p3042981.html
Sent from the Solr - User
Hi Bryan,
how do you index your html files ? I mean do you create fields for different
parts of your document (for different stop words lists, stemming, etc) ?
with DIH or solrj or something else ?
iorixxx, could you please explain a bit more your solution, because I don't
see how your solution
There is no jetty folder in the standard package ,but the jetty war file
is under example/lib folder ,so this where u need to put the newrelic
folder i guess
Regards
Sujatha
On Thu, Jun 9, 2011 at 2:03 PM, roySolr royrutten1...@gmail.com wrote:
I use Jetty, it's standard in the solr
Yes, that's the problem. There is no jetty folder.
I have try the example/lib directory, it's not working. There is no jetty
war file, only
jetty-***.jar files
Same error, could not locate a jetty instance.
--
View this message in context:
iorixxx, could you please explain a bit more your solution,
because I don't
see how your solution could give an exact highlighting, I
mean with the
different fields analysis for each fields.
It does not work with your use case (e.g. different synonyms applied different
parts of the html/xml
Hi,
I post a PDF from a CMS client, which has metadata about the document. One of
those metadata is the title. I trust the title of the CMS more than the title
extracted from the PDF, but I cannot find a way to both send
literal.title=CMS-Title as well as changing the name of the title field
Can I specify multiple language in filter tag in schema.xml ??? like below
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.
WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
Try the RPM support accessed from the accout support page ,Giving all
details ,they are very helpful.
Regards
Sujatha
On Thu, Jun 9, 2011 at 2:33 PM, roySolr royrutten1...@gmail.com wrote:
Yes, that's the problem. There is no jetty folder.
I have try the example/lib directory, it's not
Hi,
as I'm also involved in this issue (on the side of Sven) I created a
patch, that replaces the float array by a map that stores score by doc,
so it contains as many entries as the external scoring file contains
lines, but no more.
I created an issue for this:
we've played with HyphenationCompoundWordTokenFilterFactory it works
better than maintaining a word dictionary to split (although we ended
up not using it for reasons i can't recall)
see
http://lucene.apache.org/solr/api/org/apache/solr/analysis/HyphenationCompoundWordTokenFilterFactory.html
Hello
I try to boost a query with a range values but I can't find the correct
syntax :
this is ok .bq=myfield:-1^5 but I want to do something lik this
bq=myfield:-1 to 1^5
Boost value from -1 to 1
thanks
--
View this message in context:
[* TO *]^5
On 9 June 2011 11:31, jlefebvre jlefeb...@allocine.fr wrote:
Hello
I try to boost a query with a range values but I can't find the correct
syntax :
this is ok .bq=myfield:-1^5 but I want to do something lik this
bq=myfield:-1 to 1^5
Boost value from -1 to 1
thanks
--
thanks it's ok
another question
how to do a condition in bq ?
something like bq=iif(myfield1 = 0 AND myfield2 = 1;1;0)
thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/Boost-or-sort-a-query-with-range-values-tp3043328p3043406.html
Sent from the Solr - User mailing
Check the new if() function in Trunk, SOLR-2136. You could then use it in bf=
or boost=
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 9. juni 2011, at 13.05, jlefebvre wrote:
thanks it's ok
another question
how to do a
Btw. your example is a simple boolean query, and this will also work:
bq=(myfield1:0 AND myfield2:1)^100.0
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 9. juni 2011, at 13.31, Jan Høydahl wrote:
Check the new if() function
Just a quick reminder that we're meeting on Monday. Come along if you're
around.
On 1 June 2011 13:27, Richard Marr richard.m...@gmail.com wrote:
Hi guys,
Just to let you know we're meeting up to talk all-things-search on Monday
13th June. There's usually a good mix of backgrounds and
Has anyone integrated Mahout with Solr? I know that Carrot2 is part of the
core build but the docs say that it's not very good for very large indexes.
Anyone have thoughts on this?
Thanks,
Adam
Synonyms really wouldn't work for every possible combination of words in our
index.
Thanks for the idea though.
Mark
On Thu, Jun 9, 2011 at 3:42 PM, Gora Mohanty g...@mimirtech.com wrote:
On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel mark.man...@gmail.com wrote:
Not sure if this possible, but
Hi, everyone.
I have fields:
text fields: name, title, text
boolean field: isflag (true / false)
int field: popularity (0 to 9)
Now i do query:
defType=edismax
start=0
rows=20
fl=id,name
q=lg optimus
fq=
qf=name^3 title text^0.3
sort=score desc
pf=name
bf=isflag sqrt(popularity)
mm=100%
Naveen,
Not sure our requirement matches yours, but one of the things we index
is a comment item that can have one or more files attached to it. To
index the whole thing as a single Solr document we create a zipfile
containing a file with the comment details in it and any additional
I don't know much of it, but I know Grant Ingersoll posted about that:
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
On Thu, Jun 9, 2011 at 9:24 AM, Adam Estrada
estrada.adam.gro...@gmail.comwrote:
Has anyone integrated Mahout
Hi Mark,
Are you familiar with shingles aka token n-grams?
http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html
Use the empty string for the tokenSeparator to get wordstogether style tokens
in your index.
I think you'll want to apply this filter only at
I want to be able to run a query like idf(text, 'term') and have that data
returned with my search results. I've searched the docs,but I'm unable to
find how to do it. Is this possible and how can I do that ?
I want to be able to run a
query like idf(text, 'term') and have that data
returned with my search results. I've searched the
docs,but I'm unable to
find how to do it. Is this possible and how can I do
that ?
http://wiki.apache.org/solr/FunctionQuery#idf
No, you'd have to create multiple fieldTypes, one for each language
Best
Erick
On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq shariqn...@gmail.com wrote:
Can I specify multiple language in filter tag in schema.xml ??? like below
fieldType name=text class=solr.TextField
2011/6/9 Denis Kuzmenok forward...@ukr.net:
Hi, everyone.
I have fields:
text fields: name, title, text
boolean field: isflag (true / false)
int field: popularity (0 to 9)
Now i do query:
defType=edismax
start=0
rows=20
fl=id,name
q=lg optimus
fq=
qf=name^3 title text^0.3
It sounds like roySolr is running embedded Jetty, launching solr using the
start.jar
If so, then there's no app container where Newrelic can be installed.
-- Ken
On Jun 9, 2011, at 2:28am, Sujatha Arun wrote:
Try the RPM support accessed from the accout support page ,Giving all
details
Your solution seems to work fine, not perfect, but much better then
mine :)
Thanks!
If i do query like Samsung i want to see prior most relevant results
with isflag:true and bigger popularity, but if i do query like Nokia
6500 and there is isflag:false, then it should be higher because
(11/06/09 4:24), Burton-West, Tom wrote:
We are trying to implement highlighting for wildcard (MultiTerm) queries. This
seems to work find with the regular highlighter but when we try to use the
fastVectorHighlighter we don't see any results in the highlighting section of
the response.
Hello Adam,
I've managed to create a small POC of integrating Mahout with Solr for a
clustering task, do you want to use it for clustering only or possibly for
other purposes/algorithms?
More generally speaking, I think it'd be nice if Solr could be extended with
a proper API for integrating
Hello all,
I have checked the forums to see if it is possible to create and index from
multiple datasources. I have found references to SOLR-1358, but I don't think
this fits my scenario. In all, we have an application where we upload files. On
the file upload, I use the Tika extract handler
All,
I am at a bit of a loss here so any help would be greatly appreciated. I am
using the DIH to grab data from a DB. The field that I am most interested in
has anywhere from 1 word to several paragraphs worth of free text. What I
would really like to do is pull out phrases like Joe's coffee
Thanks for the reply, Tommaso! I would like to see tighter integration like
in the way Nutch integrates with Solr. There is a single param that you set
which points to the Solr instance. My interest in Mahout is with it's
abitlity to handle large data and find frequency, co-location of data,
Hi Koji,
Thank you for your reply.
It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery
and DisjunctionMaxQuery
and Query constructed by those queries.
Sorry, I'm not sure I understand. Are you saying that FVH supports MultiTerm
highlighting?
Tom
One solution to this problem is to change the order of field operation
(http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations)
to first do fmap.*= processing, then add the fields from literal.*=. Why would
anyone want to rename a field they just have explicitly named
(11/06/10 0:14), Burton-West, Tom wrote:
Hi Koji,
Thank you for your reply.
It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery and
DisjunctionMaxQuery
and Query constructed by those queries.
Sorry, I'm not sure I understand. Are you saying that FVH supports
Hmmm, when you say you use Tika, are you using some custom Java code? Because
if you are, the best thing to do is query your database at that point
and add whatever information
you need to the document.
If you're using DIH to do the crawl, consider implementing a
Transformer to do the database
The problem here is that none of the built-in filters or tokenizers
have a prayer
of recognizing what #you# think are phrases, since it'll be unique to
your situation.
If you have a list of phrases you care about, you could substitute a
single token
for the phrases you care about...
But the
Erick,
I totally understand that BUT the keyword tokenizer factory does a really
good job extracting phrases (or what look like phrases from) from my data. I
don't know why exactly but it does do it. I am going to continue working
through it to see if I can't figure it out ;-)
Adam
On Thu, Jun
The KeywordTokenizer doesn't do anything to break up the input stream,
it just treats the whole input to the field as a single token. So I don't think
you'll be able to extract anything starting with that tokenizer.
Look at the admin/analysis page to see a step-by-step breakdown of what
your
Hello Erick,
Thanks for the response. No, I am using the extract handler to extract the data
from my text files. In your second approach, you say I could use a DIH to
update the index which would have been created by the extract handler in the
first phase. I thought that lets say I get info
How are you using it? Streaming the files to Solr via HTTP? You can use Tika
on the client to extract the various bits from the structured documents, and
use SolrJ to assemble various bits of that data Tika exposes into a
Solr document
that you then send to Solr. At the point you're transferring
This thread got me thinking a bit...
Does SOLR support the concept of partial updates to documents? By this I
mean updating a subset of fields in a document that already exists in the
index, and without having to resubmit the entire document.
An example would be storing/indexing user tags
No from what I understand, the way Solr does an update is to delete the
document, then recreate all the fields, there is no partial updating of the
file.. maybe because of performance issues or locking?
-Original Message-
From: David Ross [mailto:davidtr...@hotmail.com]
Sent: 9 juin
Hi,
there seems to be no way to index CSV using the DataImportHandler.
Using a combination of
LineEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor
and
RegexTransformerhttp://wiki.apache.org/solr/DataImportHandler#RegexTransformer
as
proposed in
Hi,
to make my point more clear: if the CSV has a fixed schema / column layout,
using the RegexTransformer is of course a possibility (however awkward). But
if you want to implement a (more or less) schema free shopping search engine
...
regards
On Thu, Jun 9, 2011 at 9:31 PM, Helmut Hoffer von
I am using the guide found here (
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/)
to build an autocomplete search capability but in my data set I have some
documents which have the same value for the field that is being returned, so
for instance
Helmut,
I recently submitted SOLR-2549
(https://issues.apache.org/jira/browse/SOLR-2549) to handle both fixed-width
and delimited flat files. To be honest, I only needed fixed-width support for
my app so this might not support everything you mention for delimited files,
but it should be a
Ludovic,
how do you index your html files ? I mean do you create fields for
different
parts of your document (for different stop words lists, stemming, etc) ?
with DIH or solrj or something else ?
We are sending them over http, and using Tika to strip the HTML, at
present.
We do not split
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen
helmut...@googlemail.com wrote:
Hi,
there seems to be no way to index CSV using the DataImportHandler.
Looking over the features you want, it looks like you're starting from
a CSV file (as opposed to CSV stored in a database).
Is
I am not (yet) a tika user, perhaps that the iorixxx's solution is good for
you.
We will share the highlighter module and 2 other developments soon. ('have
to see how to do that)
Ludovic.
-
Jouve
France.
--
View this message in context:
Hi,
just looked at your code. Definitely an improvement :-)
The problem with the double-quotes is, that the delimiter (let's say ',')
might be part of the column value. The goal is to process something like
this without any tricky configuration
name1,name2,name3
val1,val2,...,val3
...
The user
s/provide and/provide any/ig ,-)
On Thu, Jun 9, 2011 at 10:01 PM, Helmut Hoffer von Ankershoffen
helmut...@googlemail.com wrote:
Hi,
just looked at your code. Definitely an improvement :-)
The problem with the double-quotes is, that the delimiter (let's say ',')
might be part of the
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Wednesday, June 08, 2011 11:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Displaying highlights in formatted HTML document
--- On Thu, 6/9/11, Bryan Loofbourrow bloofbour...@knowledgemosaic.com
wrote:
Hi,
yes, it's about CSV files loaded via HTTP from shops to be fed into a
shopping search engine.
The CSV Loader cannot map fields (only field values) etc. DIH is flexible
enough for building the importing part of such a thing but misses elegant
handling of CSV data ...
Regards
On Thu, Jun 9,
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen
helmut...@googlemail.com wrote:
Hi,
yes, it's about CSV files loaded via HTTP from shops to be fed into a
shopping search engine.
The CSV Loader cannot map fields (only field values) etc.
You can provide your own list of
Hi,
... that would be an option if there is a defined set of field names and a
single column/CSV layout. The scenario however is different csv files (from
different shops) with individual column layouts (separators, encodings
etc.). The idea is to map known field names to defined field names in
OK, I think see what you're up to. Might be pretty viable
for me as well.
Can you talk about anything in your mappings.txt files that
is an
important part of the solution?
It is not important. I just copied it. Plus html strip char filter does not
have mappings parameter. It was a copy
OK, I think see what you're up to. Might be pretty viable
for me as well.
Can you talk about anything in your mappings.txt files that
is an
important part of the solution?
It is not important. I just copied it. Plus html strip char filter does
not have mappings parameter. It was a
: Here is the error message:
:
: Fieldtype: tdate (I use the default one in solr schema.xml)
: Field value(Index): 2006-12-22T13:52:13Z
: Field value(query): [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z]
: with '[' and ']'
:
: And it generates the result below:
i think the piece of info
On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote:
Hi,
... that would be an option if there is a defined set of field names and a
single column/CSV layout. The scenario however is different csv files (from
different shops) with individual column layouts (separators, encodings
Very informative links and statement Jonathan. thank you.
On 6 June 2011 20:55, Jonathan Rochkind rochk...@jhu.edu wrote:
This is a start, for many common best practices:
http://wiki.apache.org/solr/SolrRelevancyFAQ
Many of the questions in there have an answer that involves
On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler kkrugler_li...@transpac.comwrote:
On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote:
Hi,
... that would be an option if there is a defined set of field names and
a
single column/CSV layout. The scenario however is different
Hi,
btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH
regarding the CSV format (James Dyer) and the effort to maintain the
CSVLoader (Ken Krugler). How about merging your efforts and migrating the
CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-)
Best
Yes, I asked the wrong question. What I was subconsciously
getting at is
this: how are you avoiding the possibility of getting hits
in the HTML
elements? Is that accomplished by putting tag names in your
stopwords, or
by some other mechanism?
HtmlStripCharFilter removes html tags. After it
On Jun 9, 2011, at 2:21pm, Helmut Hoffer von Ankershoffen wrote:
Hi,
btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH
regarding the CSV format (James Dyer) and the effort to maintain the
CSVLoader (Ken Krugler). How about merging your efforts and migrating the
I'm exploring SolrCloud for a new project, and have some questions based
upon what I've found so far.
The setup I'm planning is going to have a number of multicore hosts,
with cores being moved between hosts, and potentially with cores merging
as they get older (cores are time based, so once
Thanks for the feedback! This definitely gives me some options to work on!
Mark
On Thu, Jun 9, 2011 at 11:21 PM, Steven A Rowe sar...@syr.edu wrote:
Hi Mark,
Are you familiar with shingles aka token n-grams?
Where can I find the log file of solr? Is it turned on by default? (I use
Jetty)
Thanks
Ruixiang
HI,
Thank you for your answer.
But... I cannot use a boost calculated offline since the boost will changed
depending of the query made.
Each query will boost the query differently.
Any other ideaàs ?
Jeff
--
View this message in context:
On Jun 9, 2011, at 5:45 PM, Ruixiang Zhang wrote:
Where can I find the log file of solr? (I use
Jetty)
By default, it's in yourapp/solr/logs/solr.log
Is it turned on by default?
Yes. Oh, yes. Very much so. Uh-huh, you betcha.
-==-
Jack Repenning
Technologist
Codesion Business Unit
Here's help on how to setup logging
http://skybert.wordpress.com/2009/07/22/how-to-get-solr-to-log-to-a-log-file/
-
Morris
- Original Message -
From: Ruixiang Zhang rxzh...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, June 9, 2011 8:45:30 PM GMT -05:00 US/Canada Eastern
Hi Gary,
Similar thing we are doing, but we are not creating an XML doc, rather we
are leaving TIKA to extract the content and depends on dynamic fields. We
are not storing the text as well. But not sure if in future that would be
the case.
What about microsoft 7 and later related attachments.
Hi
This is my document
in php
$xmldoc = 'adddocfield name=idF_146/fieldfield
name=userid74/fieldfield name=groupuseidgmail.com/fieldfield
name=attachment_size121/fieldfield
name=attachment_namesample.pptx/field/doc/add';
$ch = curl_init(http://localhost:8080/solr/update;);
Thanks Erick for your help.
I have another silly question.
Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese,
news_Japnese etc.
after creating these field, can I copy all these to CopyField *defaultquery
*like below :
*copyField source=news_English dest=defaultquery/
copyField
it did not work :(
On Thu, Jun 9, 2011 at 12:53 PM, Bill Bell billnb...@gmail.com wrote:
You have to take the input and splitBy something like , to get it into
an array and reposted back to
Solr...
I believe others have suggested that?
On 6/8/11 10:14 PM, Pawan Darira
On Fri, Jun 10, 2011 at 10:36 AM, Pawan Darira pawan.dar...@gmail.com wrote:
it did not work :(
[...]
Please provide more details of what you tried, what was the error, and
any error messages that you got. Just saying that it did not work makes
it pretty much impossible for anyone to help you.
Hi,
curl http://localhost:8983/solr/update?commit=true -H Content-Type:
text/xml --data-binary 'adddocfield
name=idtestdoc/field/doc/add'
Regards
Naveen
On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta nkgiit...@gmail.com wrote:
Hi
This is my document
in php
$xmldoc = 'adddocfield
Hi,
Basically i need to post something like this using curl in php
The example of php explained in earlier thread,
curl http://localhost:8983/solr/update?commit=true -H Content-Type:
text/xml --data-binary 'adddocfield
name=idtestdoc/field/doc/add'
Should we need to create a temp file and
I am also planning to move to SolrCloud;
since its still in under development, I am not sure about its behavior in
Production.
Please update us once you find it stable.
On 10 June 2011 03:56, Upayavira u...@odoko.co.uk wrote:
I'm exploring SolrCloud for a new project, and have some questions
89 matches
Mail list logo