I have read about the option of copying this to a different field, using
one for searching by tokenizing, and one for sorting.
That would be the optimal way of doing it. Since sorting requires the fields
not to be analyzed/tokenized, while the searching requires it. The copy
field would be the
Thank you (and all the others who spent time answering me) very much for your
insights!
I didn't know how I've managed to miss CachedSqlEntityProcessor, but it seems
that's just what I need.
bye
Inizio: Gora Mohanty [g...@mimirtech.com]
Inviato:
Hi,
maybe I am wrong, but the // should be at the beggining of the
expression, like
//xhtml:div[@class='bibliographicData']/descendant:node(),
or if you want to search the div inside body, you have to use descendant like
Hi I was able to do it by changing datatype of all field to textgen from
textTight.
I am not sure whats wrong with textTight datatype.
Also can you please suggest me the best way to index huge database data.
Currently I tried with dataimporthandler and CVS import . But both are
giving almost
Hi
I am using Oracle Exadata as my DB. I want to index nearly 4 crore rows. I
have tried with specifing batchsize as 1. and with out specifing
batchsize. But both tests takes nearly same time.
Could anyone suggest me best way to index huge data Quickly?
--
View this message in context:
Yeah, I tried:
//xhtml:div[@class='bibliographicData']/descendant:node()
also tried
//xhtml:div[@class='bibliographicData']
Neither worked. The DIV I need also had an ID value, and I tried both
variations on ID as well. Still nothing.
XPath handling for Tika seems to be pretty basic and
Replication only copies new segment files so unless you are optimizing on
commit it will not copy entire index. Make sure you do not optimize your
index. Optimizing merges to a single segment and is not necessary. When new
docs are added new small segment files are created so typical
Thanks Hoss!
Here it is:
https://issues.apache.org/jira/browse/SOLR-2972
On Wed, Dec 14, 2011 at 4:47 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: I have been doing more tracing in the code. And I think that I
understand a
: bit more. The problem does not seem to be dismax+join, but
Hmmm, I'm not sure I'm following this.
Is there a way to query the index to not give me non-null dates in return
So you want null dates?
and:
which gives me some unwanted non-null dates in the result set
which seems to indicate you do NOT want null dates.
I honestly don't know what your desired
I really don't understand what you're asking, could you clarify with
an example or two?
Best
Erick
On Wed, Dec 14, 2011 at 10:36 AM, Mark Schoy hei...@gmx.de wrote:
Hi,
I'm using the StatsComponent to receive to lower and upper bounds of a
price field to create a price slider.
If someone
I suspect that the distributed searching is working just fine in both cases, but
your querying isn't doing what you expect due to differences in the analysis
chain. I'd recommend spending some time with the admin/analysis page
to see what is actually being parsed.
And be aware that wildcards from
We use Solr quite a bit at edelight -- and love it. However, we encountered one
minor peeve: although each individual
Solr server has its own dashboard, there's no easy way of getting a complete
overview of an entire Solr cluster and the
status of its nodes.
Over the last weeks our own Aengus
We have a large (100M) index where we add about 1M new docs per day.
We want to keep index at a constant size so the oldest ones are
removed and/or archived each day (so index contains around 100 days of
data). What is the best way to do this? We still want to keep older
data in some archive
OK. Thanks for help. I gonna try do migrate
2011/12/14 Chris Hostetter hossman_luc...@fucit.org
: I have a old project that use Lucene 2.9. Its possible to use the index
: created by lucene in SOLR? May i just copy de index to data directory of
: SOLR, or exists some mechanism to import
On Wed, Dec 14, 2011 at 5:02 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
I'm a little lost in this thread ... if you are programaticly construction
a NumericRangeQuery object to execute in the JVM against a Solr index,
that suggests you are writting some sort of SOlr plugin (or
CachedSqlEntityProcessor joins you tables fine. But be aware that it works
in the single thread only.
On Thu, Dec 15, 2011 at 12:14 PM, Finotti Simone tech...@yoox.com wrote:
CachedSqlEntityProcessor
--
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype:
What about managing a core for each day?
This way the deletion/archive is very simple. No holes in the index (which is
often when deleting document by document).
The index done against core [today-0].
The query is done against cores [today-0],[today-1]...[today-99]. Quite a
headache.
Itamar
I think managing 100 cores will be too much headache. Also
performance of querying 100 cores will not be good (need
page_number*page_size from 100 cores, and then merge).
I think having around 10 SOLR instances, each one about 10M docs.
Always search all 10 nodes. Index using some hash(doc) to
Does anybody have an idea, or better yet, measured data,
to see what the overhead of a core is, both in memory and speed?
For example, what would be the difference between having 1 core
with 100M documents versus having 10 cores with 10M documents?
I dont have any measured data, but here are my thoughts.
I think overall memory usage would be close to the same.
Speed will be slower in general, because if search speed is approx
log(n) then 10 * log(n/10) log(n), and also if merging results you
have overhead in the merge step and also if
Hi Swapna,
Do you want to modify the *indexed* value or the *stored* value? The
analyzers modify the indexed value. To modify the stored value, the only
option that I'm aware of is to write an UpdateProcessor that changes the
document before it's indexed.
*Juan*
On Tue, Dec 13, 2011 at 2:05
On 12/15/2011 1:07 PM, Robert Stewart wrote:
I think overall memory usage would be close to the same.
Is this really so? I suspect that the consumed memory is in direct
proportion to the number of terms in the index. I also suspect that
if I divided 1 core with N terms into 10 smaller cores,
I am running eight cores, each core serves up different types of
searches so there is no overlap in their function. Some cores have
millions of documents. My search times are quite fast. I don't see any
real slowdown from multiple cores, but you just have to have enough
memory for them. Memory
It is true number of terms may be much more than N/10 (or even N for
each core), but it is the number of docs per term that will really
matter. So you can have N terms in each core but each term has 1/10
number of docs on avg.
2011/12/15 Yury Kats yuryk...@yahoo.com:
On 12/15/2011 1:07 PM,
One other thing I did not mention is GC pauses. If you have smaller
heap sizes, you would have less very long GC pauses, so that can be an
advantage having many cores (if cores are distributed into seperate
SOLR instances, as seperate processes). I think you can expect 1
second pause for each GB
Hi!
I have a solrconfig.xml like:
requestHandler name=/ABC class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str
int name=start0/int
int name=rows10/int
str name=wtABC/str
str name=sortscore desc,rating asc/str
str name=fqCUSTOM FQ/str
On 12/15/2011 1:41 PM, Robert Petersen wrote:
loading. Try it out, but make sure that the functionality you are
actually looking for isn't sharding instead of multiple cores...
Yes, but the way to achieve sharding is to have multiple cores.
The question is then becomes -- how many cores
I am getting an error using the SpellChecker component with the query
another-test
java.lang.StringIndexOutOfBoundsException: String index out of range: -7
This appears to be related to this
issuehttps://issues.apache.org/jira/browse/SOLR-1630 which
has been marked as fixed. My configuration and
Hi!
I have a solrconfig.xml like:
requestHandler name=/ABC class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str
int name=start0/int
int name=rows10/int
str name=wtABC/str
str name=sortscore desc,rating asc/str
str name=fqCUSTOM FQ/str
First of all, we need to clarify some terminology here: there is no such
thing as a null date in solr -- or for that matter, there is no such
thing as a full value in any field. documents either have some value(s)
for a field, or they do not hvae any values.
If you want to constrain your
If I switch back and forth between defType=dismax and defType=edismax, the
edismax doesn't seem to obey my pf parameter. I dug through the code a
little bit and in the ExtendedDismaxQParserPlugin (Solr 3.4/Solr3.5), the
part that is supposed to add the phrase comes here:
Query phrase =
: If I switch back and forth between defType=dismax and defType=edismax, the
: edismax doesn't seem to obey my pf parameter. I dug through the code a
I just tried a sample query using Solr 3.5 with the example configs+data.
This is the query i tried...
: I really don't understand what you're asking, could you clarify with
: an example or two?
I *believe* the question is about wanting to exlcude the effects of some
fq params from the set of documents used to compute stats -- similar to
how you can exclude tagged filters when generating facet
Sure that is possible, but doesn't that defeat the purpose of sharding?
Why distribute across one machine? Just keep all in one index in that
case is my thought there...
-Original Message-
From: Yury Kats [mailto:yuryk...@yahoo.com]
Sent: Thursday, December 15, 2011 11:47 AM
To:
Hi, all!
I have a problem with distributed search. I downloaded one shard from my
production. It has:
* ~29M docs
* 11 fields
* ~105M terms
* size of shard is: 13GB
On production there are near 30 the same shards. I split this shard to 4
more smaller shards, so now I have:
small shard1:
docs:
Hi Brandon,
When I add the following to SpellingQueryConverterTest.java on the tip of
branch_3x (will be released as Solr 3.6), the test succeeds:
@Test
public void testStandardAnalyzerWithHyphen() {
SpellingQueryConverter converter = new SpellingQueryConverter();
converter.init(new
On 12/15/2011 4:46 PM, Robert Petersen wrote:
Sure that is possible, but doesn't that defeat the purpose of sharding?
Why distribute across one machine? Just keep all in one index in that
case is my thought there...
To be able to scale w/o re-indexing. Also often referred to as
Hi Steve,
I was using branch 3.5. I will try this on tip of branch_3x too.
Thanks.
On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe sar...@syr.edu wrote:
Hi Brandon,
When I add the following to SpellingQueryConverterTest.java on the tip of
branch_3x (will be released as Solr 3.6), the test
Hi,
I'm trying to implement autocomplete functionality for Address search.
I've used the KeywordTokenizerFactory LowerCaseFilterFactory. Problem is,
when I start typing the numbers at start, I got any results from SOLR (Eg:
3500 W South). Could you please guide on this
fieldType
I'm observing strange results with both the correct and incorrect behavior
happening depending on which field I put in the 'pf' param. I wouldn't think
this should be analyzer specific, but is it?
If I try:
Hi!
I have a solrconfig.xml like:
requestHandler name=/ABC class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str
int name=start0/int
int name=rows10/int
str name=wtABC/str
str name=sortscore desc,rating asc/str
str name=fqCUSTOM FQ/str
Yes the branch_3x works for me as well. The addition of the OffsetAttribute
probably corrected this issue. I will either switch to WhitespaceAnalyzer,
patch my distribution or wait for 3.6 to resolve this.
Thanks.
On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish brandon.j.f...@gmail.comwrote:
Hi
Hi all,
I feel like I must be missing something here...
I'm working on a customized version of the SearchHandler, which supports
distributed searching in multiple *local* cores.
Assuming you want to support SearchComponents, then my handler needs to
create/maintain a ResponseBuilder, which is
I see there is a lot of discussions about micro-sharding, I'll have to
read them. I'm on an older version of solr and just use master index
replicating out to a farm of slaves. It always seemed like sharding
causes a lot of background traffic to me when I read about it, but I
never tried it out.
Brandon,
Looks like SOLR-2509 https://issues.apache.org/jira/browse/SOLR-2509 fixed
the problem - that's where OffsetAttribute was added (as you noted).
I ran my test method on branches/lucene_solr_3_5/, and I got the same failure
there as you did, so I can confirm that Solr 3.5 has this bug,
Here is a talk I did on this topic at HPTS a few years ago.
On Thu, Dec 15, 2011 at 4:28 PM, Robert Petersen rober...@buy.com wrote:
I see there is a lot of discussions about micro-sharding, I'll have to
read them. I'm on an older version of solr and just use master index
replicating out to
Hi all,
I meet a very strange problem .
We use a windows server as master serviced for 5 windows slaves and 3
Linux slaves .
It has worked normally for 2 months .But today we find one of the Linux
slave's index file become very very big (150G! Others is 300M ). And we can't
find
Dmitry,
Thats beyond the scope of this thread, but Munin essentially runs
plugins which are essentially scripts that output graph configuration
and values when polled by the Munin server. So it uses a plain text
protocol, so that the scripts can be written in any language. Munin
then feeds this
Thanks. I re-started from scratch at least things have started working
now. I upgraded by deploying 3.2 war in my jboss. Also, did conf changes as
mentioned in CHANGES.txt
It did expected to have a separate libdirectory which was not required in
1.4.
New problem is that it's taking very long to
Hi Juan,
I think UpdateProcessor is what I would be needing. Can you please tell me more
about it, as to how it works and all ?
Thanks and Regards,
Swapna.
-Original Message-
From: Juan Grande [mailto:juan.gra...@gmail.com]
Sent: Thursday, December 15, 2011 11:43 PM
To:
hai i am dng research in sentimental analysis.pls give your valuable
suggestions.how to start my research
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-sentiment-analysis-tp3151415p3590933.html
Sent from the Solr - User mailing list archive at Nabble.com.
This is a generic Machine Learning question and is not related to Solr (for
which this thread is). You can ask this question on Stackoverflow.com.
However one of the approaches: Just go through the chapter in O'reilly
Programming Collective Intelligence on Non Negative Matrix Factorization. That
i am interested to work in sentimental analysis.help me
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-sentiment-analysis-tp3151415p3590952.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Erick,
I tried looking into our analyzers and also adding each of the filters that we
were using one by one and getting the documents indexed and during this testing
it was found that when using the solr.SynonymFilterFactory on top of the
latest Solr 4.0 trunk code there is issue with
I am seeing exceptions from some code I have written using SolrJ.I have
placed it into a pastebin:
http://pastebin.com/XnB83Jay
I am creating a MultiThreadedHttpConnectionManager object, which I use
to create an HttpClient, and that is used by all my
CommonsHttpSolrServer objects, of which
55 matches
Mail list logo