Hi Paul,
it's working for Query, but not for Updating (Add Bean). The getter method
is returning a Calendar (GregorianCalendar instance)
On the indexer side, a toString() or something equivalent is done and an
error is thrown
Caused by: java.text.ParseException: Unparseable date:
Hello list,
I'm new to solr but from what I'm experimenting, it's awesome.
I have a small issue regarding the highlighting feature.
It finds stuff (as I see from the query analyzer), but the highlight list
looks something like this:
lst name=highlighting
lst name=c:\0596520107.pdf/
lst
Michael wrote:
I've got a process external to Solr that is constantly feeding it new
documents, retrying if Solr is nonresponding. What's the right way to
stop Solr (running in Tomcat) so no documents are lost?
Currently I'm committing all cores and then running catalina's stop
script, but
Some extra for the pros list:
- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed
Friendly
Jan-Eirik
On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com
Next to the faceting engine:
- MoreLikeThis
- Highlighting
- Spellchecker
But also more flexible querying using the DisMax handler which is
clearly superior. Solr can also be used to store data which can be
retrieved in an instant! We have used this technique in a site and it is
obviously much
I found the solution.
If somebody will run into the same problem, here is how I solved it.
- while uploading the document:
req.setParam(uprefix, attr_);
req.setParam(fmap.content, attr_content);
req.setParam(overwrite, true);
req.setParam(commit,
Jan-Eirik B. Nævdal schrieb:
Some extra for the pros list:
- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed
+1 expecially the last point
you can also add a robot.txt and
Lukáš Vlček wrote:
I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
powered
by Solr section which are indexed by Google and other search engines but
still they decided to invest resources into
Hi,
thanks for inputs so far... however, let's put it this way:
When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
- go to LucidImagination.com and
Morning all,
I'm having problems with joining child a child entity from one database to a
parent from another...
My entity definitions look like this (names changed for brevity):
entity name=parent dataSource=db1 query=select a, b, c from
parent_table
entity name=child dataSource=db2
Lukáš Vlček wrote:
When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
Both of these (Nabble in the second case) in case any recent posts
For this list I usually end up @ http://solr.markmail.org (which I believe also
uses Lucene under the hood)
Google is such a black box ...
Pros:
+ 1 Open Source (enough said :-)
There also seems to always be the notion that crawling leads itself to
produce the best results but that is rarely
Any ideas on this? Is it worth sending a bug report?
Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.
Cheers,
Andrew.
Andrew Clegg wrote:
Hi,
If I run a MoreLikeThis query like the following:
no obvious issues.
you may post your entire data-config.xml
do w/o CachedSqlEntityProcessor first and then apply that later
On Fri, Nov 13, 2009 at 4:38 PM, Andrew Clegg andrew.cl...@gmail.com wrote:
Morning all,
I'm having problems with joining child a child entity from one database to a
Hi,
we are using the following entry in schema.xml to make a copy of one type of
dynamic field to another :
copyField source=*_s dest=*_str_s /
Is it possible to exclude some fields from copying.
We are using Solr1.3
~Vikrant
--
View this message in context:
Noble Paul നോബിള് नोब्ळ्-2 wrote:
no obvious issues.
you may post your entire data-config.xml
Here it is, exactly as last attempt but with usernames etc. removed.
Ignore the comments and the unused FileDataSource...
http://old.nabble.com/file/p26335171/dataimport.temp.xml
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It would be wonderful if from Java we could simply set a per-thread
IO priority, but, it'll be a looong time until that's possible.
So I think for now we should make a Directory impl that emulates such
Nope. It has to be manually ported. Not so much because of the language
itself but because of differences in the libraries.
2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com
Is there any tool to directly port java to .Net? then we can etxract
out the client part of the javabin code
Hi Andrew,
no idea, I'm afraid - but could you sent the output of
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the
TermVectorComponent you've already pasted.
Chantal
Andrew Clegg schrieb:
Any ideas on this? Is it worth sending a bug report?
Another thing to try, is reducing the maxThreadCount for
ConcurrentMergeScheduler.
It defaults to 3, which I think is too high -- we should change this
default to 1 (I'll open a Lucene issue).
Mike
On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn jlqu...@us.ibm.com wrote:
Hi, everyone, this is
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
Presumably this prioritizing Directory impl could wrap/decorate any
existing Directory.
Mike
The javabin format does not have many dependencies. it may have 3-4
classes an that is it.
On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
Nope. It has to be manually ported. Not so much because of the language
itself but because of differences in the
Chantal Ackermann wrote:
no idea, I'm afraid - but could you sent the output of
interestingTerms=details?
This at least would show what MoreLikeThis uses, in comparison to the
TermVectorComponent you've already pasted.
I can, but I'm afraid they're not very illuminating!
Hello all,
is there support for non-english language content indexing in Solr?
I'm interested in Bulgarian, Hungarian, Romanian and Russian.
Best regards,
Chuck
the included snowball filters support hungarian, romanian, and russian.
On Fri, Nov 13, 2009 at 9:03 AM, Chuck Mysak chuck.my...@gmail.com wrote:
Hello all,
is there support for non-english language content indexing in Solr?
I'm interested in Bulgarian, Hungarian, Romanian and Russian.
Hi Andrew,
your URL does not include the parameter mlt.boost. Setting that to
true made a noticeable difference for my queries.
If not, there is also the parameter
mlt.minwl
minimum word length below which words will be ignored.
All your other terms seem longer than 3, so it would help in
I meant the standard IO libraries. They are different enough that the code
has to be manually ported. There were some automated tools back when
Microsoft introduced .Net, but IIRC they never really worked.
Anyway it's not a big deal, it should be a straightforward job. Testing it
thoroughly
Hi,
Im trying to figure out if there is an easy way to basically reset all of any
doc boosts which you have made (for analytical purposes) ... for example if I
run an index, gather report, doc boost on the report, and reset the boosts @
time of next index ...
It would seem to be from just
I'm getting the same thing. The process runs, seemingly successfully, and I
can even go to other SOLR pages pointing to the same server and pull queries
against the index with these just-added entires. But the response to the
original import says failed and rollback both through the XML
Hello.
I am on work with Tika 0.5 and want to scan a folder system about 10GB.
Is there a comfortable way to scan folders recursively with an existing class
or have i to write it myself?
Any tips for best practise?
Greetings, Peter
--
Jetzt kostenlos herunterladen: Internet Explorer 8 und
Have one thread recursing depth first down the directories adding to
a queue (fixed size).
Have many threads reading off of the queue and doing the work.
-glen
http://zzzoot.blogspot.com/
2009/11/13 Peter Gabriel zarato...@gmx.net:
Hello.
I am on work with Tika 0.5 and want to scan a folder
On Fri, Nov 13, 2009 at 4:32 AM, gwk g...@eyefi.nl wrote:
I don't know if this is the best solution, or even if it's applicable to
your situation but we do incremental updates from a database based on a
timestamp, (from a simple seperate sql table filled by triggers so deletes
Thanks, gwk!
I want to build AND search query against field1 AND field2 etc. Both these
fields are stored in an index. I am migrating lucene code to Solr. Following
is my existing lucene code
BooleanQuery currentSearchingQuery = new BooleanQuery();
currentSearchingQuery.add(titleDescQuery,Occur.MUST);
Hey,
I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr)
points to pretty old documentation. Is there a better document I refer to
for the setting up of LocalSolr and some performance analysis?
Just
Dive in - http://wiki.apache.org/solr/Solrj
Cheers
Avlesh
On Fri, Nov 13, 2009 at 9:39 PM, javaxmlsoapdev vika...@yahoo.com wrote:
I want to build AND search query against field1 AND field2 etc. Both these
fields are stored in an index. I am migrating lucene code to Solr.
Following
is my
It looks like solr+spatial will get some attention in 1.5, check:
https://issues.apache.org/jira/browse/SOLR-1561
Depending on your needs, that may be enough. More robust/scaleable
solutions will hopefully work their way into 1.5 (any help is always
appreciated!)
On Nov 13, 2009, at
Also:
https://issues.apache.org/jira/browse/SOLR-1302
On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:
Hey,
I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr
)
points to pretty old documentation. Is
Hi there!
How can we retrieve the complete list of dynamic fields, which are currently
available in index?
Thank you in advance!
--
Eugene N Dzhurinsky
pgpKftn1PiY0K.pgp
Description: PGP signature
For a starting point, this might be a good read -
http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query
Cheers
Avlesh
On Fri, Nov 13, 2009 at 10:02 PM, javaxmlsoapdev vika...@yahoo.com wrote:
I already did dive in before. I am using solrj API and
Anyone?
Original-Nachricht
Datum: Thu, 12 Nov 2009 13:29:20 +0100
Von: gistol...@gmx.de
An: solr-user@lucene.apache.org
Betreff: Return doc if one or more query keywords occur multiple times
Hello,
I am using Dismax request handler for queries:
...select?q=foo bar
Luke Handler? - http://wiki.apache.org/solr/LukeRequestHandler
/admin/luke?numTerms=0
Cheers
Avlesh
On Fri, Nov 13, 2009 at 10:05 PM, Eugene Dzhurinsky b...@redwerk.comwrote:
Hi there!
How can we retrieve the complete list of dynamic fields, which are
currently
available in index?
Thank
I think I found the answer. needed to read more API documentation :-)
you can do it using
solrQuery.setFilterQueries() and build AND queries of multiple parameters.
Avlesh Singh wrote:
For a starting point, this might be a good read -
you can do it using
solrQuery.setFilterQueries() and build AND queries of multiple parameters.
Nope. You would need to read more -
http://wiki.apache.org/solr/FilterQueryGuidance
For your impatience, here's a quick starter -
#and between two fields
solrQuery.setQuery(+field1:foo
AFAIK there is no way to reset the doc boost. You would need to re-index.
Moreover, there is no way to search by boost.
Cheers
Avlesh
On Fri, Nov 13, 2009 at 8:17 PM, Jon Baer jonb...@gmail.com wrote:
Hi,
Im trying to figure out if there is an easy way to basically reset all of
any doc
Hi Ian and Ryan,
Thanks for the reply.
Ian, I checked your pasted config, I am using the same one except the
values of int name=startTier4/int int name=endTier25/int.
Basically I use the set up specified at http://www.gissearch.com/localsolr.
But there are still the same error I pasted in
The process initially completes with:
str name=Full Dump Started2009-11-13 09:40:46/str
str name=Indexing completed. Added/Updated: 20 documents. Deleted
0 documents./str
...but then it fails with:
str name=Full Dump Started2009-11-13 09:40:46/str
str name=Indexing failed.
great. thanks. that was helpful
Avlesh Singh wrote:
you can do it using
solrQuery.setFilterQueries() and build AND queries of multiple
parameters.
Nope. You would need to read more -
http://wiki.apache.org/solr/FilterQueryGuidance
For your impatience, here's a quick starter -
#and
Peter - if you want, download the code from Lucene in Action 1 or 2, it has
index traversal and indexing. 2nd edition uses Tika.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
On Thu, Nov 12, 2009 at 3:00 PM, Stephen Duncan Jr stephen.dun...@gmail.com
wrote:
On Thu, Nov 12, 2009 at 2:54 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:
oh man, so you were parsing the Stored field values of every matching doc
at query time? ouch.
Assuming i'm understanding
If documents are being added to and removed from an index (and commits
are being issued) while a user is searching, then the experience of
paging through search results using the obvious solr mechanism
(start=100Rows=10) may be disorienting for the user. For one
example, by the time the user
tpunder wrote:
Maybe I misunderstand what you are trying to do (or the facet.query
feature). If I did an initial query on my data-set that left me with the
following questions:
...
: On the CoreAdmin wiki page. thanks
FWIW: The only time the string schemaName appears on the CoreAdmin wiki
page is when it mentions that solr.core.schemaName is a property that is
available to cores by default.
the documentation for core specificly says...
The core tag accepts the
Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM:
Ah, the pains of optimization. Its kind of just how it is. One solution
is to use two boxes and replication - optimize on the master, and then
queries only hit the slave. Out of reach for some though, and adds many
: which documents have been updated before a successful commit. Now
: stopping solr is as easy as kill -9.
please don't kill -9 ... it's grossly overkill, and doesn't give your
servlet container a fair chance to cleanthings up. A lot of work has been
done to make Lucene indexes robust to
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It's unclear if this case is caused by IO contention, or the OS
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It's unclear if this case is caused by IO contention, or the OS
: I'm seeing this stack trace when I try to view a specific document, e.g.
: /admin/luke?id=1 but luke appears to be working correctly when I just
FWIW: I was able to reproduce this using the example setup (i picked a
doc id at random) suspecting it was a bug in docFreq when using multiple
DS requires a bunch of shard names in the url. That's all. Note that a
ds does not use the data of the solr you call.
You can create an entry point for your distributed search by adding a
new requestHandler element in solrconfig.xml. You would add the
shard list parameter to the defaults list. Do
On Fri, Nov 13, 2009 at 5:41 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: I'm seeing this stack trace when I try to view a specific document, e.g.
: /admin/luke?id=1 but luke appears to be working correctly when I just
FWIW: I was able to reproduce this using the example setup (i
Folks,
I am trying to get Lucene MMAP to work in solr.
I am assuming that when I configure MMAP the entire index will be loaded
into RAM.
Is that the right assumption ?
I have tried the following ways for using MMAP:
Option 1. Using the solr config below for MMAP configuration
Thanks for the link - there doesn't seem a be a fix version specified,
so I guess this will not officially ship with lucene 2.9?
-Peter
On Wed, Nov 11, 2009 at 10:36 PM, Robert Muir rcm...@gmail.com wrote:
Peter, here is a project that does this:
I'm not sure this is what you are looking for,
but there is FieldNormModifier tool in Lucene.
Koji
--
http://www.rondhuit.com/en/
Avlesh Singh wrote:
AFAIK there is no way to reset the doc boost. You would need to re-index.
Moreover, there is no way to search by boost.
Cheers
Avlesh
On
ah, thanks, i'll tentatively set one in the future, but definitely not 2.9.x
more just to show you the idea, you can do different things depending on
different runs of writing systems in text.
but it doesnt solve everything: you only know its Latin script, not english,
so you can't safely
: FWIW: I was able to reproduce this using the example setup (i picked a
: doc id at random) �suspecting it was a bug in docFreq
:
: Probably just a null being passed in the text part of the term.
: I bet Luke expects all field values to be strings, but some are binary.
I'm not sure i follow
: I tied to reproduce this in 1.4 using an index/configs created with 1.3,
: but i got a *different* NPE when loading this url...
I should have tried a simpler test ... iget NPE's just trying to execute
a simple search for *:* when i try to use the example index built
in 1.3 (with the 1.3
When does StreamingUpdateSolrServer commit?
I know there's a threshhold and thread pool as params but I don't see a commit
timeout. Do I have to manage this myself?
There is no direct way.
Let's say you have a nocopy_s and you do not want a copy
nocopy_str_s. This might work: declare nocopy_str_s as a field and
make it not indexed and not stored. I don't know if this will work.
It requires two overrides to work: 1) that declaring a field name that
matches a
This looks exactly like what I was needing ... this looks like it would be a
great tool / addition to Solr web interface but it looks like it only takes
(Directory d, Similarity s) (vs. subset collection of documents) ...
Either way great find, thanks for your help ...
- Jon
On Nov 13, 2009,
This is one case where permanent caches are interesting. Another case
is highlighting: in some cases highlighting takes a lot of work, and
this work is not cached.
It might be a cleaner architecture to have session-maintaining code in
a separate front-end app, and leave Solr session-free.
On
Unless I slept through it, you still need to explicitly commit, even with SUSS.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
From: erikea...@yahoo.com erikea...@yahoo.com
To:
I thought that was the way to use it (but I've never had to use it myself) and
that it means memory through the roof, yes.
If you look at the Solr Admin statistics page, does it show you which Directory
you are using?
For example, on 1 Solr instance I'm looking at I see:
readerDir :
So I think the question is really:
If I stop the servlet container, does Solr issue a commit in the shutdown hook
in order to ensure all buffered docs are persisted to disk before the JVM
exits.
I don't have the Solr source handy, but if I did, I'd look for Shutdown,
Hook and finalize in the
I'm testing out the final release of Solr 1.4 as compared to the build
I have been using from around June.
I'm using hte dismax handler for searches. I'm finding that
highlighting is completely broken as compared to previously. Much
more text is returned than it should for each string in lst
Let's take a step back. Why do you need to optimize? You said: As long as
I'm not optimizing, search and indexing times are satisfactory. :)
You don't need to optimize just because you are continuously adding and
deleting documents. On the contrary!
Otis
--
Sematext is hiring --
The 'maxSegments' feature is new with 1.4. I'm not sure that it will
cause any less disk I/O during optimize.
The 'mergeFactor=2' idea is not what you think: in this case the index
is always mostly optimized, so you never need to run optimize.
Indexing is always slower, because you amortize the
Apparently one of my conf files was broken - odd that I didn't see any
exceptions. Anyhow - excuse my haste, I don't see the problem now.
-Peter
On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin
peter.wola...@acquia.com wrote:
I'm testing out the final release of Solr 1.4 as compared to the
am unable to get the file
http://old.nabble.com/file/p26335171/dataimport.temp.xml
On Fri, Nov 13, 2009 at 4:57 PM, Andrew Clegg andrew.cl...@gmail.com wrote:
Noble Paul നോബിള് नोब्ळ्-2 wrote:
no obvious issues.
you may post your entire data-config.xml
Here it is, exactly as last
I would go with polling Solr to find what is not yet there. In
production, it is better to assume that things will break, and have
backstop janitors that fix them. And then test those janitors
regularly.
On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
So I
OK. Is there anyone trying it out? where is this code ? I can try to help ..
On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
I meant the standard IO libraries. They are different enough that the code
has to be manually ported. There were some automated tools
dataConfig
dataSource name=caffdubya driver=org.postgresql.Driver
url=jdbc:postgresql://db1/cathdb_v3_3_0 user=USER password=PASS
/
dataSource name=sinatra driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@db2:1521:biomapwh user=USER password=PASS
/
!-- The following path is on
80 matches
Mail list logo