Ok, but i need a relation beetween the two type of document for faceting
on label field.
Damien
Le 18/01/2011 18:55, Geert-Jan Brits a écrit :
Schemas are very differents, i can't group them.
In contrast to what you're saying above, you may rethink the option of
combining both type of
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[x] I/we build them from source via an SVN/Git checkout.
I rarely build, only if I would like to try an interesting patch.
[] Other (someone in
On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
I was wondering if the are binary operation filters? Haven't seen any in the
book nor was able to find any using google.
So if I had 0600(octal) in a permission field, and I wanted to return any
records that 'permission
I have to use some Lucene indexes, and Solr looks like the perfect
solution.
However, all I know about the Lucene indexes are what Luke tells me, and
simply setting the schema to represent all fields as text does not seem
to be working -- though as this is my first Solr, I am not sure if that
Hi all,
By adding more servers do u mean sharding of index.And after sharding , how
my query performance will be affected .
Will the query execution time increase.
Thanks,
Isan Fulia.
On 19 January 2011 12:52, Grijesh pintu.grij...@gmail.com wrote:
Hi Isan,
It seems your index size 25GB si
You're right the second query didn't result in an error but neither gave the
expected result.
I'm gone to have a look at the link you gave me.
Thanks !
From: Markus Jelsma markus.jel...@openindex.io
Sent: Tue Jan 18 21:31:52 CET 2011
To:
Ok I was already at this point.
My facetting system use exactly what is described in this page. I read it from
the Solr 1.4 book. Otherwise I would'nt ask.
The problem is that the filter queries doesn't affect the relevance score of
the results so I want the terms in the main query.
We need a crawler for all web pages outside our CMS, but one crucial
future seems to be missing in many of them - a way to detect changes in
these documents. Say that you have run a daily crawler job for two
months looking for new web pages to crawl in order to keep the Solr
index updated.
Dear All,
On a Linux system running a multi-core linux server, we are
experiencing a problem of too many files open which is causing tomcat
to abort. Reading the documentation, one of the things it seems we can
do is to switch to using compound indexes. We can see that in the
solrconfig.xml there
Hallo, i have a problem with Solr and it looks like RequestHandlers.. but i
dont know what i must do...
i have remove and reinstall Openjdk
installt maven2 and tika,
nothing Chane..
someware in idea for me?
Command:
curl
In Solr admin advance search page the highlighted text is not displayed
correctly for Arabic characters!
I'm using Solr Trunk 2011-01-10 ….
It use to be working in solr 1.4.1.
Does anybody knows why?
How to find Master Slave are in sync?
Is there a way apart from checking the index version of master and slave
using below two HTTP APIs?
http://master_host:port/solr/replication?command=indexversion
http://slave_host:port/solr/replication?command=details
--
View this message in context:
In order to use the ExtractingRequestHandler, you have to first copy
apache-solr-cell-version.jar and all the libraries from
contrib/extraction/lib to a lib folder next to the conf folder of your
instance.
Also, check the URL because there is an ampersand missing.
Regards,
*Juan Grande*
On
Follow Ahmet's lead here. Selecting all documents and counting will
absolutely
not work for you once you get to any real-world corpus. You want
to turn on faceting I'm pretty sure. Here's a good resource...
http://wiki.apache.org/solr/SimpleFacetParameters
Hello Solrs,
I am looking into using Solr, but my intended usage would require having
many different indexes which are not connected (e.g some index-tenancy with
one or multiple indexes per user).
I understand that creating independent indexes in Solr happens by creating
Solr cores via CoreAdmin.
Notice the index version number? If it's equal, then they are in sync.
On Wednesday 19 January 2011 13:37:32 Shanmugavel SRD wrote:
How to find Master Slave are in sync?
Is there a way apart from checking the index version of master and slave
using below two HTTP APIs?
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[X] I/we build them from source via an SVN/Git checkout.
[X] ASF Mirrors (linked in our release announcements or via the Lucene website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a downstream
project)
Is anyone familiar with the environment variable, JAVA_OPTS? I set
mine to a much larger heap size and never had any of these issues
again.
JAVA_OPTS = -server -Xms4048m -Xmx4048m
Adam
On Wed, Jan 19, 2011 at 3:29 AM, Isan Fulia isan.fu...@germinait.com wrote:
Hi all,
By adding more servers
ok, but i cant find the folders in the Tomcat folder /varlib/tomcat6/solr/
no existing contrib folder or lib folder?
where will missing an ampersand missing???
curl
http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567uprefix=attr_commit=true;
-F myfile=@test.xls
On Wed, Jan 19, 2011 at 7:35 PM, Jörg Agatz joerg.ag...@googlemail.com wrote:
ok, but i cant find the folders in the Tomcat folder /varlib/tomcat6/solr/
no existing contrib folder or lib folder?
The contrib/extraction/lib folder should be under the top-level
directory of your Solr source
J鰎g_Agatz,您好!
copy to tomcat common lib folder.
=== 2011-01-19 22:06:18 您在来信中写道:===
ok, but i cant find the folders in the Tomcat folder /varlib/tomcat6/solr/
no existing contrib folder or lib folder?
where will missing an ampersand missing???
curl
so fieldName.x ishow to address bits?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from
Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.
So 'fieldName.x' is how to address bits?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you
Hi,
I'm looking for ideas on how to make an efficient facet query on a
user's history with respect to the catalog of documents (something
like Read document already: yes / no). The catalog is around 100k
titles and there are several thousand users. Of course, each user has
a different history,
Did some more searching this morning. Perhaps being bleary eyed helpe :-) I
found this JIRA which does bitwise boolean operator filtering:
https://issues.apache.org/jira/browse/SOLR-1913
I'm not that sure how to interpret JIRA pages for features. It's 'OPEN, but
the
comments all say it
Issue created:
https://issues.apache.org/jira/browse/SOLR-2323
On Tuesday 04 January 2011 20:08:40 Markus Jelsma wrote:
Hi,
It seems abort-fetch nicely removes the index directory which i'm
replicating to which is fine. Restarting, however, does not trigger the
the same feature as the
Take a look at Apache ManifoldCF (incubating, close to 0.1 release):
http://incubator.apache.org/connectors/
In addition to a fairly sophisticated general web crawler which maintains
the state of crawled web pages it has a file system crawler and crawlers for
a variety of document
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[ X ] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[ X ] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
[] ASF Mirrors (linked in our release announcements or via the Lucene website)
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a downstream
project)
I know that this is often a performance problem -- but Erick, I am
interested in the 'better solution' you hint at!
There are a variety of cases where you want to 'dump' all documents from
a collection. One example might be in order to build a Google SiteMap
for your app that's fronting your
What query are you actually trying to do? There's probably a way to do
it, possibly using nested queries -- but not using illegal syntax like
some of your examples! If you explain what you want to do, someone may
be able to tell you how. From the hints in your last message, I suspect
nested
No. There is no built in way to address 'bits' in Solr that I am aware
of. Instead you can think about how to transform your data at indexing
into individual tokens (rather than bits) in one or more field, such
that they are capable of answering your query. Solr works in tokens as
the basic
There's nothing that I know of that would accomplish this, sorry...
Best
Erick
On Tue, Jan 18, 2011 at 11:22 PM, kun xiong xiongku...@gmail.com wrote:
Hi Erick,
Thanks for the fast reply. I kind of figured it was not supposed to be
that way.
But it would have some benefits when we need
You're better off using two cores on the same Solr instance rather than two
instances of Tomcat, that way you avoid some overhead.
The usual advice is to monitor the Solr caches, particularly for evictions
and
size the Solr caches accordingly. You can see these from the admin/stats
page
and also
Then you probably want to consider simply flattening the data and storing
the
relevant data with a single schema. If that doesn't work for you, there is a
limited
join capability going into the trunk, see:
https://issues.apache.org/jira/browse/SOLR-2272
Best
Erick
On Wed, Jan 19, 2011 at 3:17
I don't really think this is possible/reasonable. There's nothing fixed
about
a Lucene index, you could index a field in different documents with any
number of analysis chains. The tricky part here will, as you've discovered,
find a way to match the Solr schema closely enough to get your desired
Hi all,
Im looking into solr's highlighting component, as far as I understood,
solr's response.getHighlighting() gives back formatted string along with id,
then we have to loop through the searched documents and search for id and
then replace the formatted string. This approach will
Let's back up a ways here and figure out why you're getting so many
files open.
1 how many files are in your index?
2 are you committing very frequently?
3 or do you simply have a LOT of cores?
4 do you optimize your indexes? If so, how many files to you have in your
cores before/after
Indeed, wouldn't reducing the number of segments be a better idea? Speeds up
searching too! Do you happen to have a very high mergeFactor value for each
core?
On Wednesday 19 January 2011 17:53:12 Erick Erickson wrote:
You're perhaps exactly right in your approach, but with a bit more info
we
Solr will handle lots of cores, but that page is talking about lots.
Thousands.
But I question why you *require* many different indexes. It's perfectly
reasonable
to store different fields in different documents in the *same* index, unlike
a table in an RDBMS.
There are good reasons to have
So, if I used something like r-u-d-o in a field (read,update,delete,others) I
could get it tokenized to those four characters,and then search for those in
that field. Is that what you're suggesting, (thanks by the way).
An article I read created a 'hybrid' access control system (can't remember
If someone is looking for good documentation and getting started guides, I am
putting this in the newsgroups to be searched upon. I recommend:
A/ The Wikis: (FREE)
http://wiki.apache.org/solr/FrontPage
B/ The book and eBook: (COSTS $45.89)
I would like to index the information of my employees to be able to get
through some fields such as: e-mail, registration, ID, cell phone, name.
I am very new to SOLR and would like to know how to index these fields this
way and how to search filtering by some of these fields.
Thanks in advance
Hi,
I notice that in the schema, it is only possible to specify a Analyzer
class, but not a Factory class as for the other elements (Tokenizer,
Fitler, etc.).
This limits the use of this feature, as it is impossible to specify
parameters for the Analyzer.
I have looked at the IndexSchema
Yep, that's what I'm suggesting as one possible approach to consider,
whether it will work or not depends on your specifics.
Character length in a token doesn't really matter for solr performance.
It might be less confusing to actually put read update delete own (or
whatever 'o' stands for)
That someone should just visit the wiki:
http://wiki.apache.org/solr/SolrResources
If someone is looking for good documentation and getting started guides, I
am putting this in the newsgroups to be searched upon. I recommend:
A/ The Wikis: (FREE)
http://wiki.apache.org/solr/FrontPage
Actually we don't have much load on the server (like the usage currently is
quite low) but user queries are very complex e.g. long phrases/multiple
proximity/wildcard etc so I know these values need to be tried out but I
wanted to see whats the right 'start' so that I am not way off.
Also
pThree-dimensional multi value sounds good.#160; Tough choice on character
vs full-length words. Full length os easier amp; less confusing, but with
hopefully millions pd documents in the future, it increasas index size./p
pSent from Yahoo! Mail on Android/p
http://lucene.apache.org/solr/#getstarted
I would like to index the information of my employees to be able to get
through some fields such as: e-mail, registration, ID, cell phone, name.
I am very new to SOLR and would like to know how to index these fields this
way and how to search
You only need so much for Solr so it can do its thing. Faceting can take quite
some memory on a large index but sorting can be a really big RAM consumer.
As Erick pointed out, inspect and tune the cache settings and adjust RAM
allocated to the JVM if required. Using tools like JConsole you can
Hi!
I would like to announce Solr-RA, Solr with RankingAlgorithm. Solr-RA
uses the RankingAlgorithm, a new scoring and ranking algorithm instead
of Lucene to rank the searches. Solr with RA seems to enable Solr
searches to be comparable to Google site search results, and much better
than
Hi,
I've never seen Solr's behaviour with a huge amount of values in a multi
valued but i think it should work alright. Then you can stored a list of user
ID's along with each book document and user filter queries to include or
exclude the book from the result set.
Cheers,
Hi,
I'm
We do have sorting but not faceting. OK so I guess there is no 'hard and
fast rule' as such so I will play with it and see.
Thanks for the help
On Wed, Jan 19, 2011 at 11:48 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
You only need so much for Solr so it can do its thing. Faceting can
Hi,
I'm unsure if i completely understand but you first had the error for
local.code and then set the property in solr.xml? Then of course it will give
an error for the next undefined property that has no default set.
If you use a property without default it _must_ be defined in solr.xml or
I've checked the archive, and plenty of people have suggested an
arrangement where you can have two cores which share a configuration but
maintain separate data paths. But I can't seem to get solr to stop
thinking solrconfig.xml is the first and last word for any value
regarding data. I am
Sorting on field X will build an array of the size of maxDoc. The data type
equals the one used by the field you're sorting on. Also, if you have a very
high amount of deletes per update it might be a good idea to optimize as well
since it reduces maxDoc to the number of documents that actually
You have set the property already but i haven't seen you use that same
property for the dataDir setting in solrconfig.
I've checked the archive, and plenty of people have suggested an
arrangement where you can have two cores which share a configuration but
maintain separate data paths. But I
Hi,
I need to add some meta data to a schema file in Solr - such a version and
current transaction id. I need to be able to query Solr to get this
information. What would be the best way to do this?
Thanks,
David
The problem is going to be 'near real time' indexing issues. Solr 1.4
at least does not do a very good job of handling very frequent commits.
If you want to add to the user's history in the Solr index ever time
they click the button, and they click the button a lot, and this
naturally leads
Hi,
Are there performance issues during the index switch?
As the size of index gets bigger, response time slows down? Are there any
studies on this?
Thanks,
Tri
During commit?
A commit (and especially an optimize) can be expensive in terms of both
CPU and RAM as your index grows larger, leaving less CPU for querying,
and possibly less RAM which can cause Java GC slowdowns in some cases.
A common suggestion is to use Solr replication to seperate out
i even have to define default values for the dataimport.delta values? that
doesn't seem right
On Wed, Jan 19, 2011 at 11:57 AM, Markus Jelsma
markus.jel...@openindex.iowrote:
Hi,
I'm unsure if i completely understand but you first had the error for
local.code and then set the property in
Hi,
Are there performance issues during the index switch?
What do you mean by index switch?
As the size of index gets bigger, response time slows down? Are there any
studies on this?
I haven't seen any studies as of yet but response time will slow down for some
components. Sorting
No, you only need defaults if you use properties that are not defined in
solr.xml or solrcore.properties.
What would the value for local.core be if you don't define it anyway and you
don't specify a default? Quite unpredictable i gues =)
i even have to define default values for the
David,
I'm not sure if you are asking about adding this to the schema.xml file or to
the Solr schema and therefore the Solr index?
If the former, you could put it in comments, then get the schema via HTTP (see
Admin UI for the URL), and grep for your line from there.
If the latter, this sounds
there error I am getting is that I have no default value
for ${dataimporter.last_index_time}
should I just define -00-00 00:00:00 as the default for that field?
On Wed, Jan 19, 2011 at 12:45 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
No, you only need defaults if you use properties
Yes, during a commit.
I'm planning to do as you suggested, having a master do the indexing and
replicating the index to a slave which leads to my next questions.
During the slave replicates the index files from the master, how does it impact
performance on the slave?
Tri
--- On Wed,
Ok, have you defined dataimporter.last_index_time in solr.xml or
solrcore.properties? If not, then you can either define the default value or
set it in solrcore.properties or solr.xml.
Maybe a catch up on the wiki clears things up:
Hello Users,
About a little over a year ago, a few of us started working on what we called
SolrCloud.
This initial bit of work was really a combination of laying some base work -
figuring out how to integrate ZooKeeper with Solr in a limited way, dealing
with some infrastructure - and picking
Tri,
During replication:
* extra disk IO on slaves during replication - worst if you are replicating an
optimized index, which can hurt if your index is not RAM resident
* the above will consume some of your OS buffer cache, which can hurt
* increased network usage - never seen this becoming a
On 1/19/2011 2:56 PM, Tri Nguyen wrote:
Yes, during a commit.
I'm planning to do as you suggested, having a master do the indexing and replicating the index to a slave which leads to my next questions.
During the slave replicates the index files from the master, how does it impact
Thanks Otis, yes it is the former and it definitely solves my problem for my
static metadata.
I realise now though that for dynamic values like transaction id, I probably
need a different method for storing this metadata though. Is there a
standard way of adding metadata to a Solr core and being
Hello Erick,
Thanks for your answer!
But I question why you *require* many different indexes. [...] including
isolating one
users'
data from all others, [...]
Yes, thats exactly what I am after - I need to make sure that indexes don't
mix, as every user shall only be able to query his own
Hi all
We are planning to move our search core from Lucene library to Solr, and
we are new here.
We have a question :which parser we should choose?
Our original query for Lucene is kinda of complicated
Ex: *+((name1:A name2:B)^1000 (category1:C ^100 category:D ^10) ^100)
+(location1:E
Hi all
We are planning to move our search core from
Lucene library to Solr, and
we are new here.
We have a question :which parser we should choose?
Our original query for Lucene is kinda of complicated
Ex: *+((name1:A name2:B)^1000 (category1:C ^100
category:D ^10) ^100)
Markus,
Its not wt its qt, wt for response type,
Also qt is not for Query Parser its for Request Handler ,In solrconfig.xml
there are many Request Handlers can be Defined using dismax Query Parser
Or Using lucene Query Parser.
If you want to change Query parser then its defType parameter for
We construct our query by Lucene API before, as BooleanQuery, TermQuery
those kind of things.
The string I provided is value from Query.toString() methord. Type are all
String.
2011/1/20 Ahmet Arslan iori...@yahoo.com
Hi all
We are planning to move our search core from
Lucene library
Hi Mark,
I was just working on SolrCloud for my RD and I got a question in my Mind.
Since in SolrCloud the configuration files are being shared on all Cloud
instances and If I have different configuration files for different cores
then how can I manage it by my Zookeeper managed SolrCloud.
By adding more server means add more searchers (slaves) on Load balancer not
talking about sharding.
Sharding is required when your index size will increase the size of about
50GB.
-
Thanx:
Grijesh
--
View this message in context:
Thar example string means our query is BooleanQuery containing
BooleanQuerys.
I am wondering how to write a complicated BooleanQuery for dismax, like (A
or B or C) and (D or E)
Or I have to use Lucene query parser.
2011/1/20 Lalit Kumar 4 lkum...@sapient.com
Sent on my BlackBerry® from
81 matches
Mail list logo