You could also put a short representation of the data (I suggest days since
01.01.2010) as payload and calculate boost with payload function of the
similarity.
-Original Message-
From: ext Jason Brown [mailto:jason.br...@sjp.co.uk]
Sent: Montag, 29. November 2010 17:28
To:
Here result with debugQuery:
For term annual:
result name=response numFound=0 start=0/
lst name=debug
str name=rawquerystringannual/str
str name=querystringannual/str
str name=parsedquerytext:year text:twelve-month text:onceayear
text:yearbook/str
str name=parsedquery_toStringtext:year
I found the problem: solr.EnglishPorterFilterFactory in the analyzer
type=query form that parsedquery.
--
View this message in context:
http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1991321.html
Sent from the Solr - User mailing list archive at Nabble.com.
We had the same problem for our fields and we wrote a Tokenizer using the icu4j
library. Breaking tokens at script changes, and dealing with them according the
script and the configured Breakiterators.
This works out very well, as we also add the scrip information to the token
so later filter
aha aha :D
hm i dont know. we import in 2MillionSteps because we think that solr locks
our database and we want a better controll of the import ...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1991392.html
Sent from
As mentioned, in the typical case it's important that the field names be
included in the signature, but i imagine there would be cases where you
wouldn't want them included (like a simple concat Signature for building
basic composite keys)
I think the Signature API could definitely be
The index itself isn't corrupt - just one of the segment files. This
means you can read the index (less the offending segment(s)), but once
this happens it's no longer possible to
access the documents that were in that segment (they're gone forever),
nor write/commit to the index (depending on the
hi,
I might not understand your case right but can you not add an extra
publishedDate field and then specify a secondary (after relevance) sort by
that?
On 30 November 2010 08:05, jan.kure...@nokia.com wrote:
You could also put a short representation of the data (I suggest days since
Hmm this is in fact a regression.
TopFieldCollector expects (but does not verify) that numHits is 0.
I guess to fix this we could fix TopFieldCollector.create to return a
NullCollector when numHits is 0.
But: why is your app doing this? Ie, if numHits (rows) is 0, the only
useful thing you
Hi - you do understand may case - we tried what you suggested but as the
relevancy is very precise we couldn't get it it to do a dual-sort.
I like the idea of using one of the dismax parameters (bf) to in-effect
increase the boost on a newer document.
Thanks for all replies, most useful.
ahhh I see..good point..yes, for a high number of unique scores the
secondary sort won't have any effect..
On 30 November 2010 09:32, Jason Brown jason.br...@sjp.co.uk wrote:
Hi - you do understand may case - we tried what you suggested but as the
relevancy is very precise we couldn't get it
Hi,
I have written a plugin to filter on email types and keep those tokens,
however when I run it in the analysis in the admin it all works fine.
But when I use the data import handler to import the data and set the field
type it doesn't remove the other tokens and keeps the field in the
Am 30.11.2010 10:56, schrieb Greg Smith:
Hi,
I have written a plugin to filter on email types and keep those tokens,
however when I run it in the analysis in the admin it all works fine.
But when I use the data import handler to import the data and set the field
type it doesn't remove
Hi,
I found the problem:
The class name has been changed to 1.4.1:
From: import org.apache.solr.response.SolrQueryResponse;
To: import org.apache.solr.request.SolrQueryResponse;
Best,
---
Hong-Thai
-Message d'origine-
De : Hong-Thai Nguyen
On Nov 29, 2010, at 5:17 PM, Shawn Heisey wrote:
I was just in a meeting where we discussed customer feedback on our website.
One thing that the users would like to see is galleries where photos that
are part of a set are grouped together under a single result. This is
basically field
Hi,
I was wondering how I would go about getting the lucene docid included in the
results from a solr query?
I've built a QueryParser to query another solr instance and and join the
results of the two instances through the use of a Filter. The Filter needs the
lucene docid to work. This is
take a look into this:
http://vimeo.com/16102543
for that amount of data it isn't that easy :-)
We are looking into building a reporting feature and investigating solutions
which will allow us to search though our logs for downloads, searches and
view history.
Each log item is relatively
Hi,
I have a windows cluster that I would like to install Solr onto, there
are two nodes that provide basic failover. I was thinking of this setup:
Tomcat installed as win service
Two solr instances sharing the same index
The second instance would take over when the first fails, so you
i know, it's not solr .. but perhaps you should have a look at it:
http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
On Tue, Nov 30, 2010 at 12:58 PM, Peter Karich peat...@yahoo.de wrote:
take a look into this:
http://vimeo.com/16102543
for that amount of
We do a lot of precisely this sort of thing. Ours is a commercial
product (Honeycomb Lexicon) that extracts behavioural information from
logs, events and network data (don't worry, I'm not pushing this on
you!) - only to say that there are a lot of considerations beyond base
Solr when it comes to
Solr doesn't lock anything as far as I know, it just executes the
query you specify. The query you specify may well do bad things
to your database, but that's not Solr's fault. What happens if you
simply try executing the query outside Solr? Do you see the
same locking behavior?
You might want to
On Tue, Nov 30, 2010 at 10:29 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Hmm this is in fact a regression.
TopFieldCollector expects (but does not verify) that numHits is 0.
I guess to fix this we could fix TopFieldCollector.create to return a
NullCollector when numHits is 0.
Bernd,
Looking at the results returned in the search results the field is populated
with all of the information regardless of whether there was an email
contained in the contents.
Would the way the analysers and tokens be handled different if using a copy
field?
Thanks
On 30 November 2010
See below. If this still doesn't make sense, could you show us some
examples?
Best
Erick
On Tue, Nov 30, 2010 at 8:33 AM, Greg Smith audi...@gmail.com wrote:
Bernd,
Looking at the results returned in the search results the field is
populated
with all of the information regardless of whether
Hello.
index is about 28 Million documents large. When i starts an delta-import is
look at modified. but delta import takes to long. over an hour need solr for
delta.
thats my query. all sessions from the last hour should updated and all
changed. i think its normal that solr need long time for
Please provide more data. Specifically:
how many documents are updated?
Have you tried running this query without Solr? In other words
have you investigated whether the speed issue is simply your
SQL executing slowly?
Why are you selecting the last 10 hours' data when all you want
is
everyday ~30.000 Documents and every hour ~1200
multiple thread with DIH ? how it works ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992767.html
Sent from the Solr - User mailing list archive at Nabble.com.
how do you think is the deltaQuery better ? XD
--
View this message in context:
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Tue, Nov 30, 2010 at 8:24 AM, Martin Grotzke
martin.grot...@googlemail.com wrote:
Still I'm wondering, why this issue does not occur with the plain
example solr setup with 2 indexed docs. Any explanation?
It's an old option you have in your solrconfig.xml that causes a
different code path to
i copied the wrong query, because 10 hours ;)
i didnt test the query with 28 million records . but wiht a few million and
it works fine. ...
before i used DIH, i used php and import direclty documents into solr. but i
want use dih because the better performance, i think so ... grml ...
--
+1
That's exactly what we need, too.
On Mon, Nov 29, 2010 at 5:28 PM, Shawn Heisey elyog...@elyograg.org wrote:
On 11/29/2010 3:15 PM, Jacob Elder wrote:
I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single body field which
Hello,
someone can explain the difference between queryNorm and FieldNorm in
debugQuery??
why if i push one bf boost up, the queryNorm goes down??
i made some modifies..before the situation was different. why??
thanx
--
Gastone Penzo
Right. CJK doesn't tend to have a lot of whitespace to begin with. In the
past, we were using a patched version of StandardTokenizer which treated
@twitteruser and #hashtag better, but this became a release engineering
nightmare so we switched to Whitespace.
Perhaps I could rephrase the question
Hmm, I found some similar queries on stackoverflow and they did not recommend
exposing the lucene docId.
So, I guess my question becomes: What is the best way, from within my custom
QParser, to take a list of solr primary keys (that were retrieved from
elsewhere) and turn them into docIds? I
Rather have a Master and multiple Slave combination, with master only being
used for writes and slaves used for reads.
Master to Slave replication is easily configurable.
Two Solr instances sharing the same index is not at all good idea with both
writing to the same index.
Regards,
Jayendra
On
Hi,
I am running multiple Solr cores (solr-tomcat 1.4.0+ds1-1ubuntu1) under
Tomcat (6.0.24-2ubuntu1.4) on Ubuntu 10.04.1. I have a master server where
all Solr writes go, and a slave server that replicates all cores from the
master, and accepts all read-only queries.
After maxing out PermGen
I don't know, you'll have to debug it to see if it's the thing that takes so
long. Solr
should be able to handle 1,200 updates in a very short time unless there's
something
else going on, like you're committing after every update or something.
This may help you track down performance with DIH
okay.
the query kills the database, because no index of modified is set ...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1993750.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Tommaso,
On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote:
Hi all,
in a replication environment if the host where the master is running
goes
down for some reason, is there a way to communicate to the slaves to
point
to a different (backup) master without manually changing
I need index and search some pdf files which are very big (around 1000 pages
each). How can I set maxFieldLength to unlimited?
Thanks so much for your help in advance,
Xiaohui
On Tue, Nov 30, 2010 at 3:09 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Tue, Nov 30, 2010 at 8:24 AM, Martin Grotzke
martin.grot...@googlemail.com wrote:
Still I'm wondering, why this issue does not occur with the plain
example solr setup with 2 indexed docs. Any explanation?
It's
Set the maxFieldLength value in solrconfig.xml to, say, 2147483647
Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.
Best
Erick
On Tue, Nov 30, 2010 at
Hi,
Thanks Jacob and Ken for your replies.
I am not able to change project architecture to add Lucandra even if it
looks like a nice solution.
Going the VIP way can definitely an option even if I'd be more keen to solve
that inside Solr.
I am thinking to try and play with Collection Distribution
Hi,
I am using cached SQL entity processor in my data config, please find below
the structure of my data config file.
entity name=object query=select * from x where objecttype=''test1'
entity name=objectproperty query=select * from y
processor=CachedSqlEntityProcessor
Thanks so much for your help!
Xiaohui
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd
Set the maxFieldLength value in solrconfig.xml to,
We've got a largish corpus (~94 million documents). We'd like to be able
to sort on one of the string fields. However this takes an incredibly
long time. A warming query for that field takes about ~20 minutes.
However most of the time the result sets are small since we use filters
heavily -
Bump. Anyone?
-J
On Nov 29, 2010, at 3:17 PM, John Williams wrote:
Recently, we have started to get Bad file descriptor errors in one of our
Solr instances. This instance is a searcher and its index is stored on a
local SSD. The master however has it's index stored on NFS, which seems to
I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files
again. I also commented out the one in the mainIndex section. Unfortunately
the files are still chopped out if the size of file is more than 20MB.
Any suggestions? I really appreciate your help!
Xiaohui
Hi,
I'd like to know if anybody has suggestions/opinions on what is
currently the best architecture for a distributed search system using Solr. The
use case is that of a system composed
of N indexes, each hosted on a separate machine, each index containing unique
content.
Options that
Greetings, we're wondering why we can issue the command to shutdown
tomcat/solr but the process remains visible in memory (by using the top
command) and we have to manually kill the PID for it to release its
memory before we can (re)start tomcat/solr? Anybody have any ideas?
The process is using
Greetings, we are running one master and four slaves of our multicore
solr setup. We just served searches for our catalog of 8 million
products with this farm during black Friday and cyber Monday, our
busiest days of the year, and the servers did not break a sweat! Index
size is about 28GB.
Hi Robert,
I'd recommend launching Tomcat with -XX:+HeapDumpOnOutOfMemoryError
and -XX:HeapDumpPath=path to where you want the file to go, so then
you have something to look at versus a Gedankenexperiment :)
-- Ken
On Nov 30, 2010, at 3:04pm, Robert Petersen wrote:
Greetings, we are
I don't know who you are replying to here, but...
There's nothing to stop you doing:
* import 2m docs
* sleep 2 days
* import 2m docs
* sleep 2 days
* repeat above until done
* commit
There's no reason why you should commit regularly. If you need to slow
down for your DB, do, but that
After a recent Windows 7 crash (:-\), upon restart, Solr starts giving
LockObtainFailedException errors: (excerpt)
30-Nov-2010 23:10:51 org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock
obtain timed out:
Hi Tommaso,
I believe you can tell each server to act as a master (which means it
can have its indexes pulled from it).
You can then include the master hostname in the URL that triggers a
replication process. Thus, if you triggered replication from outside
solr, you'd have control over which
I cannot say how mature the code for B) is, but it is not yet included
in a release.
If you want the ability to distribute content across multiple nodes (due
to volume) and want resilience, then use both.
I've had one setup where we have two master servers, each with four
cores. Then we have two
On 11/30/2010 2:27 PM, Cinquini, Luca (3880) wrote:
Hi,
I'd like to know if anybody has suggestions/opinions on what is
currently the best architecture for a distributed search system using Solr. The
use case is that of a system composed
of N indexes, each hosted on a separate machine,
What would I do with the heap dump though? Run one of those java heap
analyzers looking for memory leaks or something? I have no experience
with thoseI saw there was a bug fix in solr 1.4.1 for a 100 byte memory
leak occurring on each commit, but it would take thousands of commits to
make that
1. make sure the Server port=8005 shutdown=SHUTDOWN the port is not used.
2. ./bin/shutdown.sh tail -f logs/xxx to see what the server is doing
if you just feed data or modified index, and don't flush/commit,
when shutdowning, it will do something.
2010/12/1 Robert Petersen rober...@buy.com:
On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersen rober...@buy.com wrote:
My question is this. Why in the world would all of my slaves, after
running fine for some days, suddenly all at the exact same minute
experience OOM heap errors and go dead?
If there is no change in query traffic when
you may implement your own MergePolicy to keep on large index and
merge all other small ones
or simply set merge factor to 2 and the largest index not be merged by
set maxMergeDocs less than the docs in the largest one.
So there is one large index and a small one. when adding a little
docs, they
On 11/30/2010 3:49 PM, Robert Petersen wrote:
That raises another question: top can show only 20 GB free out of 64
but the tomcat/solr process only shows its using half of that. What is
using the rest? The numbers don't add up...
Chances are that it's your operating system disk cache.
Hi, A diagram will be very much appreciated.
Thanks,
Jayant
From: u...@odoko.co.uk
To: solr-user@lucene.apache.org
Subject: Re: distributed architecture
Date: Wed, 1 Dec 2010 00:39:40 +
I cannot say how mature the code for B) is, but it is not yet included
in a release.
If you
Hi Upayavira,
this is a good start for solving my problem, can you please tell how does
such a replication URL look like?
Thanks,
Tommaso
2010/12/1 Upayavira u...@odoko.co.uk
Hi Tommaso,
I believe you can tell each server to act as a master (which means it
can have its indexes pulled from
Hi team
My solr version is 1.4
There is an ArrayIndexOutOfBoundsException when i sort one field and the
following is my code and log info,
any help will be appreciated.
Code:
SolrQuery query = new SolrQuery();
query.setSortField(author, ORDER.desc);
Greetings,
The Seattle Scalability Meetup isn't slacking for the holidays. We've
got an awesome lineup for Wed, December 8 at 7pm:
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/
-Jake Mannix from Twitter will talk about the Twitter Search
infrastructure (with distributed Lucene)
On Wed, Dec 1, 2010 at 10:56 AM, Jerry Li zongjie...@gmail.com wrote:
Hi team
My solr version is 1.4
There is an ArrayIndexOutOfBoundsException when i sort one field and the
following is my code and log info,
any help will be appreciated.
Code:
SolrQuery query = new SolrQuery();
Wow, would you put a diagram somewhere up on the Solr site?
Or, here, and I will put it somewhere there.
And, what is a VIP?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’
Solr 4- You mean the Solr 'trunk' source or the Solr 1.4.1 release?
The 1.4.1 release does not have the TikaEntityProcessor, only the /extract code.
The Solr 3.x branch and the trunk have the TikaEP. I use the 3.x
branch and, well, the TikaEP has a few problems but can be hacked
around.
69 matches
Mail list logo