Am I correct in thinking that trie fields don't support
sortMissingLast (my tests show that they don't). If not, is there any
plan for adding it in?
Regards,
Steve
On Wed, Sep 30, 2009 at 3:01 PM, con convo...@gmail.com wrote:
Hi all
I am getting incorrect results when i search with numbers only or string
containing numbers.
when such a search is done, all the results in the index is returned,
irrespective of the search key.
For eg, the phone number
On Tue, Sep 29, 2009 at 6:42 PM, Jörg Agatz joerg.ag...@googlemail.comwrote:
Hi Users...
i have a Problem
I have a lot of fields, (type=text) for search in all fields i copy all
fields in the default text field and use this for default search.
Now i will search...
This is into a
Hi All,
I'm working with data that has multiple date precisions most of
which do not have a time associated with them, rather centuries (like
1800's), years (like 1867), and years/months (like 1918-11). I'm
able to sort and search using a workaround where we store the date as a
string
Hi Joe,
Currently the patch does not do that, but you can do something else
that might help you in getting your summed stock.
In the latest patch you can include fields of collapsed documents in
the result per distinct field value.
If your specify collapse.includeCollapseDocs.fl=num_in_stock in
1. In my playing around with
sending in an XML document within a an XML CDATA tag,
with termVectors=true
I noticed the following behavior:
personpeter/person
collapses to the term
personpeterperson
instead of
person
and
peter separately.
I realize I could try and do a search and replaces of
Hi all,
I would have two questions about the ReversedWildcardFilterFactory:
a) put it into both chains, index and query, or into index only?
b) where exactly in the/each chain do I have to put it? (Do I have to
respect a certain order - as I have wordDelimiter and lowercase in
there, as well.)
Hello list.
So, i setup my schema.xml with the different chains of analyzers and
filters for each field (i.e. i created types text-en, text-de, text-it).
As i have to index documents in different languages, this is good.
But what defines the analyzers and filters for the query?
Let's suppose
Hi Claudio,
in schema.xml, the analyzer element accepts the attribute type.
If you need different analyzer chains during indexing and querying,
configure it like this:
fieldType name=channel_name class=solr.TextField
analyzer type=index
!-- indexing analyzer chain defined here
Hi all,
Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?
Thanks.
Jerome.
--
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net
I am trying to automate a build process that adds documents to 10 shards
over 5 machines and need to limit the size of a shard to no more than
200GB because I only have 400GB of disk available to optimize a given shard.
Why does the size (du) of an index typically decrease after a commit?
On Oct 1, 2009, at 8:32 AM, Jérôme Etévé wrote:
Hi all,
Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?
Please have a look at
It may take some time before resources are released and garbage
collected, so that may be part of the reason why things hang around
and du doesn't report much of a drop.
On Oct 1, 2009, at 8:54 AM, Phillip Farber wrote:
I am trying to automate a build process that adds documents to 10
Phillip Farber wrote:
I am trying to automate a build process that adds documents to 10
shards over 5 machines and need to limit the size of a shard to no
more than 200GB because I only have 400GB of disk available to
optimize a given shard.
Why does the size (du) of an index typically
Whoops - they way I have mail come in, not easy to tell if I'm replying
to Lucene or Solr list ;)
The way Solr works with Searchers and reopen, it shouldn't run into a
situation that requires greater than
2x to optimize. I won't guarantee it ;) But based on what I know, it
shouldn't happen under
Thanks, that's exactly the kind of answer I was looking for.
Chantal Ackermann wrote:
Hi Claudio,
in schema.xml, the analyzer element accepts the attribute type.
If you need different analyzer chains during indexing and querying,
configure it like this:
fieldType name=channel_name
Chantal Ackermann wrote:
Hi all,
I would have two questions about the ReversedWildcardFilterFactory:
a) put it into both chains, index and query, or into index only?
b) where exactly in the/each chain do I have to put it? (Do I have to
respect a certain order - as I have wordDelimiter and
Hi guys,
Although i've been looking at Solr on and off for a few months, I'm still
getting to grips with the schema and filter/tokenizers.
I'm having trouble using the solr.KeepWordFilterFactory functionality and
there doesnt appear to be previous discussions here regarding it. I
basically have
bq. and reindex without any merges.
Thats actually quite a hoop to jump as well - though if you determined
and you have tons of RAM, its somewhat doable.
Mark Miller wrote:
Nice one ;) Its not technically a case where optimize requires 2x
though in case the user asking gets confused. Its a
Ok, one more question on this issue. I used to have an all field where
i used to copyField title content and keywords defined with
typeField text, which used to have english-language dependant
analyzers/filters. Now I can copyField all the three content-* fields
as I know that only one of the
Hi.
This situation is still bugging me.
I thought i had it fixed yday, but no...
Seems like this goes both for deleting and adding, but I'll explain
the delete-situation here:
When I'm deleting documents(~5k) from a index, i get a error message
saying
Only one usage of each socket address
Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would
guess it is applied to the tokens, so I suppose I should put it at the
very end - after WordDelimiter and Lowercase have been applied.
Is that correct?
analyzer type=index
filter
I've now worked on three different search engines and they all have a
3X worst
case on space, so I'm familiar with this case. --wunder
On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:
Nice one ;) Its not technically a case where optimize requires 2x
though in case the user asking gets
I just noticed this comment in the default schema:
!--
These types should only be used for back compatibility with existing
indexes, or if sortMissingLast functionality is needed. Use
Trie based fields instead.
--
Does that mean TrieFields are never going to get
Oops, the missing trailing Z was probably just a cut and paste error.
It might be tough to come up with a case that can reproduce it -- it's a
sticky issue. I'll post it if I can, though.
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Tuesday,
I am trying to update to the newest version of solr from trunk as of May
5th. I updated and compiled from trunk as of yesterday (09/30/2009). When
I try to do a full import I am receiving a GC heap error after changing
nothing in the configuration files. Why would this happen in the most
recent
hello martijn, thx for the tip, i tried that approach but ran into two
snags, 1. returning the fields makes collapsing a lot slower for
results, but that might just be the nature of iterating large results.
2. it seems like only dupes of records on the first page are returned
or is tehre a a
Chantal Ackermann wrote:
Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would
guess it is applied to the tokens, so I suppose I should put it at the
very end - after WordDelimiter and Lowercase have been applied.
Is that correct?
analyzer type=index
Sorry about asking this here, but I can't reach wiki.apache.org right now.
What do I set in query.setMaxRows() to get all the rows?
--
http://www.linkedin.com/in/paultomblin
Jeff Newburn wrote:
I am trying to update to the newest version of solr from trunk as of May
5th. I updated and compiled from trunk as of yesterday (09/30/2009). When
I try to do a full import I am receiving a GC heap error after changing
nothing in the configuration files. Why would this
Sorry, in my last question I meant setRows not setMaxRows. Whay do I pass to
setRows to get all matches, not just the first 10?
-- Sent from my Palm Prē
You probably want to add the following command line option to java to
produce a heap dump:
-XX:+HeapDumpOnOutOfMemoryError
Then you can use jhat to see what's taking up all the space in the heap.
Bill
On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller markrmil...@gmail.com wrote:
Jeff Newburn
Hi Andrzej,
thanks! Unfortunately, I get a ClassNotFoundException for the
solr.ReversedWildcardFilterFactory with my nightly build from 22nd of
September. I've found the corresponding JIRA issue, but from the wiki
it's not obvious that this might require a patch? I'll have a closer
look at
Andrew Clegg wrote:
hossman wrote:
This is why the examples of using context files on the wiki talk about
keeping the war *outside* of the webapps directory, and using docBase in
your Context declaration...
http://wiki.apache.org/solr/SolrTomcat
Great, I'll try it this
Hi All,
I'm trying Solr CEL outside of the example and running into trouble
because I can't refer to the
http://wiki.apache.org/solr/ExtractingRequestHandler (the wiki's down).
After realizing I needed to copy all the jars from /example/solr/lib to
my indexes /lib dir, I am now hitting
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in the query
string cause an NPE. Is this normal behaviour? I never tried it with my
previous installation.
Example:
http://myserver:8080/solr/select/?title:%22Creatine+kinase%22
(I've also tried without the URL
On 1 Oct 09, at 12:46 PM, Tricia Williams wrote:
STREAM_SOURCE_INFO
https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
appears to be a constant from this page:
http://lucene.apache.org/solr/api/constant-values.html
This has it embedded as an arr in the
It was added to trunk on the 11th and shouldn't require a patch. You
sure that nightly was actually build after then?
solr.ReversedWildcardFilterFactory should work fine.
Chantal Ackermann wrote:
Hi Andrzej,
thanks! Unfortunately, I get a ClassNotFoundException for the
don't forget q=... :)
Erik
On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in
the query
string cause an NPE. Is this normal behaviour? I never tried it with
my
previous installation.
Example:
Sorry! I'm officially a complete idiot.
Personally I'd try to catch things like that and rethrow a
'QueryParseException' or something -- but don't feel under any obligation to
listen to me because, well, I'm an idiot.
Thanks :-)
Andrew.
Erik Hatcher-4 wrote:
don't forget q=... :)
When I do a query directly form the web, the XML of the response
includes how many results would have been returned if it hadn't
restricted itself to the first 10 rows:
For instance, the query:
http://localhost:8080/solrChunk/nutch/select/?q=*:*fq=category:mysites
returns:
response
lst
Added the parameter and it didn't seem to dump when it hit the gc limit
error. Any other thoughts?
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562
From: Bill Au bill.w...@gmail.com
Reply-To: solr-user@lucene.apache.org
Date: Thu, 1 Oct 2009 12:16:53 -0400
Jeff Newburn wrote:
Added the parameter and it didn't seem to dump when it hit the gc limit
error. Any other thoughts?
You might use jmap to take a look at the heap (you can do it well its
live with Java6) or to force a heap dump when you specify.
Since its spending 98% of the time in gc
Mark Miller wrote:
You might use jmap to take a look at the heap (you can do it well its
live with Java6)
Errr - just so I don't screw anyone in a production environment - it
will freeze your app while its getting the info.
--
- Mark
http://www.lucidimagination.com
My question is why isn't the DateField implementation of ISO 8601 broader so
that it could include and MM as acceptable date strings? What would
it take to do so?
Nobody ever cared? But yes, you're right, the spurious precision is
annoying. However, there is no fuzzy search for
If the wiki isn't working
https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
gave me more information. The LucidImagination article helps too.
Now that the wiki is up again it is more obvious that I need to add:
str name=fmap.contentfulltext/str
str
Don't be too hard on yourself.
Sometimes, mistakes like that can happen even to the most brilliant and most
experienced.
On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg andrew.cl...@gmail.com wrote:
Sorry! I'm officially a complete idiot.
Personally I'd try to catch things like that and
Trie fields also do not support faceting. They also take more ram in
some operations.
Given these defects, I'm not sure that promoting tries as the default
is appropriate at this time. (I'm sure this is an old argument.:)
On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote:
I
Indeed... and the only reason I knew the answer right away is because
I've experienced this myself numerous times :)
Erik
On Oct 1, 2009, at 11:46 AM, Israel Ekpo wrote:
Don't be too hard on yourself.
Sometimes, mistakes like that can happen even to the most brilliant
and most
For future reference, the Solr Lucene wikis and mailing lists are
indexed on http://www.lucidimagination.com/search/
On Thu, Oct 1, 2009 at 11:40 AM, Tricia Williams
williams.tri...@gmail.com wrote:
If the wiki isn't working
On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:
I am trying to update to the newest version of solr from trunk as of May
5th.
Tons of changes since... including the per-segment
searching/sorting/function queries (I think).
Do you sort on any single valued fields that
I've heard there is a new partial optimize feature in Lucene, but it
is not mentioned in the Solr or Lucene wikis so I cannot advise you
how to use it.
On a previous project we had a 500GB index for 450m documents. It took
14 hours to optimize. We found that Solr worked well (given enough RAM
for
On Thu, Oct 1, 2009 at 10:39 AM, Steve Conover scono...@gmail.com wrote:
I just noticed this comment in the default schema:
!--
These types should only be used for back compatibility with existing
indexes, or if sortMissingLast functionality is needed. Use
Trie based fields
bq. Tons of changes since... including the per-segment
searching/sorting/function queries (I think).
Yup. I actually didn't think so, because that was committed to Lucene in
Feburary - but it didn't come into Solr till March 10th. March 5th just
ducked it.
Yonik Seeley wrote:
On Thu, Oct 1,
Ha! Searching partial optimize on
http://www.lucidimagination.com/search , we discover SOLR-603 which
gives the 'maxSegments' option to the optimize command. The text
does not include the word 'partial'.
It's on http://wiki.apache.org/solr/UpdateXmlMessages. The command
gives a number of Lucene
On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote:
bq. Tons of changes since... including the per-segment
searching/sorting/function queries (I think).
Yup. I actually didn't think so, because that was committed to Lucene in
Feburary - but it didn't come into Solr till
Whoops. There is my lazy brain for you - march, may, august - all the
same ;)
Okay - forgot Solr went straight down and used FieldSortedHitQueue.
So it all still makes sense ;)
Still interested in seeing his field sanity output to see whats possibly
being doubled.
Yonik Seeley wrote:
On Thu,
On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller markrmil...@gmail.com wrote:
Still interested in seeing his field sanity output to see whats possibly
being doubled.
Strangely enough, I'm having a hard time seeing caching at the different levels.
I mad a multi-segment index (2 segments), and then
1) That is correct. Including collapsed documents fields can make you
search significantly slower (depending on how many documents are
returned).
2) It seems that you are using the parameters as was intended. The
collapsed documents will contain all documents (from whole query
result) that have
On Thu, Oct 1, 2009 at 4:05 PM, Yonik Seeley yo...@lucidimagination.com wrote:
On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller markrmil...@gmail.com wrote:
Still interested in seeing his field sanity output to see whats possibly
being doubled.
Strangely enough, I'm having a hard time seeing
On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com wrote:
Since isTokenized() more reflects if something is tokenized at the
Lucene level, perhaps we need something that specifies if there is
more than one logical value per field value? I'm drawing a blank on a
good name
thx for the reply, i just want the number of dupes in the query
result, but it seems i dont get the correct totals,
for example a non collapsed dismax query for belgian beer returns X
number results
but when i collapse and sum the number of docs under collapse_counts,
its much less than X
it
Is that possible? Implemented?
I want to be able to have SOLR Slave instance on publicly available host
(accessible via HTTP), and synchronize with Master securely (via HTTP)
I had it implicitly with cron jobs running as 'root' user, and Tomcat as
'tomcat'... Slave wasn't able to update index
Yonik Seeley wrote:
On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com
wrote:
Since isTokenized() more reflects if something is tokenized at the
Lucene level, perhaps we need something that specifies if there is
more than one logical value per field value? I'm
Ok I was able to get a heap dump from the GC Limit error.
1 instance of LRUCache is taking 170mb
1 instance of SchemaIndex is taking 56Mb
4 instances of SynonymMap is taking 112mb
There is no searching going on during this index update process.
Any ideas what on earth is going on? Like I said
Thanks Lance,
I have lucid's search as one of my open search tools in my browser.
Generally pretty useful (especially the ability to filter) but it's not
of much help when the tool points out that the best info is on the wiki
and the link to the wiki reveals that it can't be reached. This
i gotten two different out of memory errors while using the field
collapsing component, using the latest patch (2009-09-26) and the
latest nightly,
has anyone else encountered similar problems? my collection is 5
million results but ive gotten the error collapsing as little as a few
thousand
Jeff Newburn wrote:
Ok I was able to get a heap dump from the GC Limit error.
1 instance of LRUCache is taking 170mb
1 instance of SchemaIndex is taking 56Mb
4 instances of SynonymMap is taking 112mb
There is no searching going on during this index update process.
Any ideas what on earth
I loaded the jvm and started indexing. It is a test server so unless
some errant query came in then no searching. Our instance has only
512mb but my concern is the obvious memory requirement leap since it
worked before. What other data would be helpful with this?
On Oct 1, 2009, at 5:14
--- On Wed, 9/23/09, Amit Nithian anith...@gmail.com wrote:
Hi Amith,
Thanks for your reply.How do i set preference for the links , which should
appear first,second in the search results.
Which configuration file in Solr needs to be modified to achieve the same?.
Regards
Bhaskar
From:
Not in time for 1.4, but yes they will eventually get it.
It has to do with the representation... currently we can't tell
between a 0 and missing.
Hmm. So does that mean that a query for latitudes, stored as trie
floats, from -10 to +10 matches documents with no (i.e. null) latitude
value?
QueryResponse#getResults()#getNumFound()
On Thu, Oct 1, 2009 at 11:49 PM, Paul Tomblin ptomb...@xcski.com wrote:
When I do a query directly form the web, the XML of the response
includes how many results would have been returned if it hadn't
restricted itself to the first 10 rows:
For
On Thu, Oct 1, 2009 at 11:09 PM, Steve Conover scono...@gmail.com wrote:
Not in time for 1.4, but yes they will eventually get it.
It has to do with the representation... currently we can't tell
between a 0 and missing.
Hmm. So does that mean that a query for latitudes, stored as trie
On Thu, Oct 1, 2009 at 8:45 PM, Jeffery Newburn jnewb...@zappos.com wrote:
I loaded the jvm and started indexing. It is a test server so unless some
errant query came in then no searching. Our instance has only 512mb but my
concern is the obvious memory requirement leap since it worked before.
74 matches
Mail list logo