Hi~. I'm beginner who wanna make search system by using solr 1.4.1 and lucene
2.92.
I got a collect lucene query from my custom Analyzer and filter from given
query,
but no result displayed.
Here is my Analyzer source.
Are you using KLTQueryAnalyzer outside of the Solr? (pre-process)
Or you defined a fieldType in schema.xml that uses KLTQueryAnalyzer?
Can you append debugQuery=on to your search url and paste output?
--- On Mon, 10/18/10, Jerad ag...@naver.com wrote:
From: Jerad ag...@naver.com
Subject: How
Hi,
you can try to parse the xml via Java yourself and then push the
SolrInputDocuments it via SolrJ to solr.
setting format to binaray + using the streaming update processor should
improve performance,
but I am not sure... and performant (+less mem!) reading xml in Java is
another topic ... ;-)
You'll have to supply your dates in a format Solr expects (e.g.
2010-10-19T08:29:43Z
and not 2010-10-19). If you don't need millisecond granularity you can use
the DateMath syntax to specify that.
Please, also check http://wiki.apache.org/solr/SolrQuerySyntax.
On 17 October 2010 10:54, nedaha
A http-get call is simply made by entering the url into your browser, like
shown in the example in the wiki:
http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=
path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_nam
e.xmldataDir=data
-Ursprüngliche
Thanks for your reply.
I know about the solr date format!! Check-in and Check-out dates are
user-friendly format that we use in our search form for system's users. and
i change the format via code and then send them to solr.
I want to know how can i make a query to compare a range between
Oops, I'm Sorry! I found some mistakes on previous posted source.( Main class
name has been wrong :)
This is the collect analyzer source.
---
public class MyCustomQueryAnalyzer extends Analyzer{
Hello all,
I have a field in my schema which holds the number of votes a document
has. How can I boost documents based on that number?
Something like the one which has the maximum number has a boost of 10,
the one with the smallest number has 0.5 and in between the values get
calculated
ok, maybe don't get this right..
are you trying to match something like check-in date 2010-10-19T00:00:00Z
AND check-out date 2010-10-21T00:00:00Z *or* check-in date =
2010-10-19T00:00:00Z
AND check-out date = 2010-10-21T00:00:00Z?
On 18 October 2010 10:05, nedaha neda...@gmail.com wrote:
Hi All,
I am planning to have a separate server for solr and regarding
hardware requirements i have a doubt about what configuration to be needed.
I know it will be hard to tell but i just need a minimum requirement for the
particular situation as follows::
1) There are 1000 regular
The exact query that i want is:
check-in date = 2010-10-19T00:00:00Z
AND check-out date = 2010-10-21T00:00:00Z
but because of the structure that i have to index i don't have specific
start date and end date in my solr index to compare with check-in and
check-out date range. I have some dates
I asked this myself ... here could be some pointers:
http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-in-Single-Core-td475238.html
Hi everyone,
I'm trying to write some code for creating and using multi cores.
ok, I see now..well, the only query that comes to mind is something like:
check-in date:[2010-10-19T00:00:00Z TO *] AND check-out date:[* TO
2010-10-21T00:00:00Z]
would something like that work?
On 18 October 2010 11:04, nedaha neda...@gmail.com wrote:
The exact query that i want is:
rawquerystring = +body:flyaway
parsedquery = +body:fly +body:away
shows that your custom filter is working as you expected.
However you are using different tokenizers in query (standardtokenizer
hard-coded) and index (whitespacetokenizer) time. That may cause numFound=0.
For example if your
I have a field in my schema which holds the number of votes
a document
has. How can I boost documents based on that number?
you can do it with http://wiki.apache.org/solr/FunctionQuery
I know but I can't figure out what functions to use. :)
On Mon, Oct 18, 2010 at 1:38 PM, Ahmet Arslan iori...@yahoo.com wrote:
I have a field in my schema which holds the number of votes
a document
has. How can I boost documents based on that number?
you can do it with
Hi!
I'm trying to implement some kind of Search Suggestion on a search
engine I have implemented. This search suggestions should not be
automatically like the one described for the SpellCheckComponent [1].
I'm looking something like:
SAS oppositions = Public job offers for some-company
So
Thanks.
Not really the answer I wanted to hear, but at least I know this is not my
fault ;)
Regards
Thomas
Erick Erickson, 15.10.2010 20:42:
This is actually known behavior. The problem is that when you update
a document, it's deleted and re-added, but the original is marked as
deleted.
Hello Lance, thank you for your reply.
I created the following JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2171, as suggested.
Can you tell me how new issues are handled by the development teams,
and whether there's a way I could help/contribute ?
--
Tanguy
2010/10/16 Lance Norskog
Just following up to see if anybody might have some words of wisdom on the
issue?
Thank you,
Ken
It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy
On Fri,
Also, the little-advertised DIH debug page can help, see:
solr/admin/dataimport.jsp
Best
Erick
On Sun, Oct 17, 2010 at 11:56 AM, William Pierce evalsi...@hotmail.comwrote:
Two suggestions: a) Noticed that your dih spec in the solrconfig.xml seems
to to refer to db-data-config.xml but you
I think if you look closely you'll find the date quoted in the Exception
report doesn't match any of the declared formats in the schema. I would
suggest, as a first step, hunting through your data to see where that date
is coming from.
-Mike
-Original Message-
From: Ken Stanley
Hi, Gora
I haven't tried yet indexing huge amount of xml files through curl or pure
java(like a post.jar).
Indexing through xml is really fast?
How many files did you index? And How did it(using curl or pure java)?
Thanks, Gora
--
View this message in context:
Well, always get the biggest, fastest machine you can G...
On a serious note, you're right, there's not much info to go
on here. And even if there were more info, Solr performance
depends on how you search your data as well as how much
data you have...
About the only way you can really tell is
The beauty/problem with open source is issues are picked up when
somebody thinks they're important enough and has the time/energy
to work on it. And that person can be you G...
What usually happens is that someone submits a patch, various
people comment on it, look it over, ask for changes or
I know but I can't figure out what
functions to use. :)
Oh, I see. Why not just use {!boost b=log(vote)}?
May be scale(vote,0.5,10)?
On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim hialo...@gmail.com wrote:
Hi, Gora
I haven't tried yet indexing huge amount of xml files through curl or pure
java(like a post.jar).
Indexing through xml is really fast?
How many files did you index? And How did it(using curl or pure java)?
[...]
Hi,
here is some more info about it. I use Solr to output only the file
names(file id's). Here i enclose the fields in my schema.xml and presently i
have only about 40MB of indexed data.
field name=id type=string indexed=true stored=true
required=true /
field name=sku type=textTight
Recommend using the pdate format for faster range queries.
Here's how (or one way) to do a range query in solr
defType=luceneq=some_field:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]
Does that answer your question? I don't really understand what you're trying
to do with your two
On Mon, Oct 18, 2010 at 7:52 AM, Michael Sokolov soko...@ifactory.comwrote:
I think if you look closely you'll find the date quoted in the Exception
report doesn't match any of the declared formats in the schema. I would
suggest, as a first step, hunting through your data to see where that
Thanks Peter. That helps a lot. It's weird that this not documented anywhere. :(
On Mon, Oct 18, 2010 at 3:42 PM, Peter Karich peat...@yahoo.de wrote:
I asked this myself ... here could be some pointers:
http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html
On Mon, Oct 18, 2010 at 10:12 AM, Tharindu Mathew mcclou...@gmail.com wrote:
Thanks Peter. That helps a lot. It's weird that this not documented anywhere.
:(
Feel free to edit the wiki :)
Do you already have the files as solr XML? If so, I don't think you need solrj
If you need to build SolrInputDocuments from your existing structure,
solrj is a good choice. If you are indexing lots of stuff, check the
StreamingUpdateSolrServer:
Thank you for reply, Gora
But I still have several questions.
Did you use separate index?
If so, you indexed 0.7 million Xml files per instance
and merged it. Is it Right?
Please let me know how to work multiple instances and cores in your case.
Regards,
--
View this message in context:
You know about the 'invariant' that can be set in the request handler,
right? Not sure if that will do for you or not, but sounds related.
Added recnetly to some wiki page somewhere although the feature has been
there for a long time. Let's see if I can find the wiki page...Ah yes:
Thanks for your reply. But as i replied the following to Erick's suggestion
which is quite the same:
Yes, we're using it but the problem is that there can be many fields
and that means quite a large list of parameters to set for each request
handler, and there can be many request handlers.
I have an indexing pipeline that occasionally needs to check if a
document is already in the index (even if not commited yet).
Any suggestions on how to do this without calling commit/ before each check?
I have a list of document ids and need to know which ones are in the
index (actually I need
Hi, i'm new in the mailing list.
I'm implementing Solr in my actual job, and i'm having some problems.
I was testing the consistency of the commits. I found for example that if
we add X documents to the index (without commiting) and then we restart the
service, the documents are commited. They
The documents should be implicitly committed when the Lucene index is
closed.
When you perform a graceful shutdown, the Lucene index gets closed and the
documents get committed implicitly.
When the shutdown is abrupt as in a KILL -9, then this does not happen and
the updates are lost.
You can
Hi all
I have a huge amount of xml files for indexing.
I want to index using solrj binary format to get performance gain.
Because I heard that using xml files to index is quite slow.
But I don't know how to use index through solrj binary format and can't find
examples.
Please give some help.
Is there interest in having a Meetup at ApacheCon? Who's going? Would anyone
like to present? We could do something less formal, too, and just have drinks
and QA/networking. Thoughts?
-Grant
Hi,
I am looking for a quick solution to improve a search engine's spell checking
performance. I was wondering if anyone tried to integrate Google SpellCheck API
with Solr search engine (if possible). Google spellcheck came to my mind
because of two reasons. First, it is costly to clean up
I understand, but i want to have control of what is commit or not.
In our scenario, we want to add documents to the index, and maybe after an
hour trigger the commit.
If in the middle, we have a server shutdown or any process sending a
Shutdown signal to the process. I don't want those documents
No.. you would just turn autocommit off, and have the thread that is
doing updates to your indexes commit every hour. I'd think that this
would take care of the scenario that you are describing.
Matt
On 10/18/2010 3:50 PM, Ezequiel Calderara wrote:
I understand, but i want to have control
But if something happens in between that hour, i will have lost or committed
the documents to the index out of the schedule.
How can i handle this scenario?
I think that Solr (or Lucene) should make sure of the
durabilityhttp://en.wikipedia.org/wiki/Durability_(database_systems)of
the data even
Oops, never mind. Just read Google API policy. 1000 queries per day limit for
non-commercial use only.
-Original Message-
From: Xin Li
Sent: Monday, October 18, 2010 3:43 PM
To: solr-user@lucene.apache.org
Subject: Spell checking question from a Solr novice
Hi,
I am looking for a
I'll see if i can resolve this adding an extra core with the same schema for
holding this documents.
So, Core0 will act as a Queue and the Core1 will be the real index. And
the commit in the core0 will trigger an add to the core1 and its commit.
That way i can be sure of not losing data.
It
In general, the benefit of the built-in Solr spellcheck is that it can
use a dictionary based on your actual index.
If you want to use some external API, you certainly can, in your actual
client app -- but it doesn't really need to involve Solr at all anymore,
does it? Is there any benefit
I think a spellchecker based on your index has clear advantages. You can
spellcheck words specific to your domain which may not be available in an
outside dictionary. You can always dump the list from wordnet to get a
starter english dictionary.
But then it also means that misspelled words from
If you know the misspellings you could prevent them from being added to the
dictionary with a StopFilterFactory like so:
fieldType name=textSpell class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter
We need to index documents where the fields in the document can change
frequently.
It appears that we would need to update our Solr schema definition before we
can reindex using new fields.
Is there any way to make the Solr schema optional?
--frank
Hello guys,
I need to indexing the first character of the field autor in another field
inicialautor.
Example:
autor = Mark Webber
inicialautor = M
I did a javascript function in the dataimport, but the field inicialautor
indexing empty.
The function:
function InicialAutor(linha) {
Hi Frank,
Check out the Dynamic Fields option from here
http://wiki.apache.org/solr/SchemaXml
Tim
-Original Message-
From: Frank Calfo [mailto:fca...@aravo.com]
Sent: Monday, October 18, 2010 5:25 PM
To: solr-user@lucene.apache.org
Subject: Schema required?
We need to index documents
Do we need an admin screen for spellchecker? Where you can browse the words
and delete the ones you don't like so that they don't get suggested?
You can cross the new words against a dictionary and keep them in the file
as Jason described...
What Pradeep said is true, is always better to have suggestions related to
your index that have suggestions with no results...
On Mon, Oct 18, 2010 at 6:24 PM, Jason Blackerby
How are you declaring the transformer in the dataconfig?
On Mon, Oct 18, 2010 at 6:31 PM, Renato Wesenauer
renato.wesena...@gmail.com wrote:
Hello guys,
I need to indexing the first character of the field autor in another
field
inicialautor.
Example:
autor = Mark Webber
inicialautor
You can use regular expression based template transformer without writing a
separate function. It's pretty easy to use.
On Mon, Oct 18, 2010 at 2:31 PM, Renato Wesenauer
renato.wesena...@gmail.com wrote:
Hello guys,
I need to indexing the first character of the field autor in another
field
i was thinking about, you also would need to mark a word like valid, so it
doesn't mark it as wrong.
On Mon, Oct 18, 2010 at 6:37 PM, Pradeep Singh pksing...@gmail.com wrote:
Do we need an admin screen for spellchecker? Where you can browse the words
and delete the ones you don't like so that
Frank Calfo wrote:
We need to index documents where the fields in the document can change
frequently.
It appears that we would need to update our Solr schema definition before we
can reindex using new fields.
Is there any way to make the Solr schema optional?
No. But you can design your
You can just do this with a copyfield in your schema.xml instead. Copy
to a field which uses regexpfilter or some other analyzer to limit to
first non-whitespace char (and perhaps force upcase too if you want).
That's what I'd do, easier and will work if you index to Solr from
something other
This exact topic was just discussed a few days ago...
http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#3059a28929451cb4
My comments on when/where it makes sense to put this logic...
Hi All,
I am indexing a web application with approximately 9500 distinct URL and
contents using Nutch and Solr.
I use Nutch to fetch the urls, links and the crawl the entire web
application to extract all the content for all pages.
Then I run the solrindex command to send the content to Solr.
Thanks for your reply :)
1. I tested that q=*:*fl=body , 1 doc returned as result as I expected.
2. I'm edit my scheme.xml as you instructed.
analyzer type=query
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer
//No filter description.
/analyzer
but no result
I've installed Solr a hundred times using Tomcat (on Windows) but now need to
get it going with WebSphere (on Windows). For whatever reason this seems to be
black magic :) I've installed the war file but have no idea how to set Solr
home to let WebSphere know where the index and config files
You need to make sure that the following system variable is one of the
values specific in the JAVA_OPTS environment variable
-Dsolr.solr.home=path_to_solr_home
On Mon, Oct 18, 2010 at 10:20 PM, Kevin Cunningham
kcunning...@telligent.com wrote:
I've installed Solr a hundred times using
I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
this possible to do? If so, could someone give me a tip or two on
getting started?
Thanks,
Matt
Once you built the solr 4.0 jar, you can use mvn's install command
like this:
mvn install:install-file -DgroupId=org.apache -DartifactId=solr
-Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar
-DgeneratePom=true
@tommychheng
On 10/18/10 7:28 PM, Matt Mitchell wrote:
I'd
The first question to ask is will it work for you.
The SECOND question is do you want google to know what's in your data?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so
I would love to go, but funds are low right now. NEXT year, I'd have something
to demo though :-)
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make
When I get my site which uses Solr/Lucene going, is is considered polite to
post a small paragraph about it with a link?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so
Hi Dennis,
There is a PoweredBy page on the Wiki that's good for that.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Dennis Gearon gear...@sbcglobal.net
To:
Solr requires a schema.
But Lucene does not! :)
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Frank Calfo fca...@aravo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Cool, thanks!
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
Hi Israel,
You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika
Not sure if it's built into Nutch, though...
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Israel
Is there something in Solr/Lucene that could give me the equivalent to:
SELECT
COUNT(*)
WHERE
date_column1 :start_date AND
date_column2 :end_date;
Providing I take into account deleted documents, of course (I.E., do some sort
of averaging or some tracking function over time.)
Dennis
: There is a PoweredBy page on the Wiki that's good for that.
Even better is a post to the list telling folks about your usee case,
index size, hardware, etc
A lot of new users find that information really helpful for comparison.
-Hoss
:
: SELECT
: COUNT(*)
: WHERE
: date_column1 :start_date AND
: date_column2 :end_date;
q=*:*fq=column1:[start TO *]fq=column2:[end TO *]rows=0
...every result includes a total count.
-Hoss
77 matches
Mail list logo